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Table A The Genetic Code 


Second Position 
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— u 
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TableB Redundancy of the Genetic Code 


Amino Acid Abbreviation Codons 
3-letter 1-letter 
Alanine Ala A GCA, GCC, GCG, GCU 
Arginine Arg R AGA, AGG, CGA, CGC, CGG, CGU 
Asparagine Asn N AAC, AAU 
Aspartic acid Asp D GAC, GAU 
Cysteine Cys C UGC, UGU 
Glutamic acid Glu E GAA, GAG 
Glutamine Gln Q CAA, CAG 
Glycine Gly G GGA, GGC, GGG, GGU 
Histidine His H CAC, CAU 
Isoleucine lle | AUA, AUC, AUU 
Leucine Leu IL UUA, UUG, CUA, CUC, CUG, CUU 
Lysine Lys K AAA, AAG 
Methionine Met M AUG 
Phenylalanine Phe E UUC, UUU 
Proline Pro R CCA CCACCGACEU 
Serine Ser $ AGC, AGU, UCA, UCC, UCG, UCU 
Threonine Thr T ACA, ACC, ACG, ACU 
Tryptophan Trp Ww UGG 
Tyrosine Tyr M UAC, UAU 
Valine Val V GUA, GUC, GUG, GUU 


Integrated and Improved 
Problem-Solving Strategy 


Genetic Analysis worked examples provide unparalleled 
support for problem-solving instruction. 


A consistent approach to problem solving is used throughout the book to help students 
understand the logic and purpose of each step in the problem-solving process. Genetic 
Analysis is integrated throughout each chapter, following discussions of important 
content, to help students immediately apply concepts in a problem-solving context. 


Each Genetic Analysis example guides Every Genetic Analysis example is presented NEW! A new “Break it Down” component 
students with a unique, consistent, in a clear, two-column format that helps has been added to help students get 
three-step approach that trains them students see the Solution Strategy in one started with formulating an approach to 
to Evaluate, Deduce, and then Solve column and its corresponding execution in a solving a problem. 

problems. separate Solution Step column. 


nucleotides in one strand of a duplex are 
complementary to those in the other, 
and the strands are antiparallel (p. 234). 


Solution Strategies Solution Steps 

Evaluate 

1. Identify the topic this problem addresses, and 1. The question concerns a DNA sequence and requests an answer 
the nature of the required answer. giving the sequence and polarity of the complementary strand. 

2. Identify the critical information given in the 2. The sequence and polarity are given for a portion of one DNA strand. 
problem. 

Deduce 

3. Review the general structure of a DNA duplex 3. DNA is a double helix composed of single strands that contain 
and the complementarity of specific complementary base pairs (A pairs with T, and c with c). The comple- 
nucleotides. mentary strands are antiparallel (i.e., one strand is 5’ to 3’, and its 


complement is 3’ to 5’). 


Solve 
4. Identify the sequence of the complementary 4. The complementary sequence is TGCTGCGAT. 
strand. 
5. Give the polarity of the complementary strand. 5. The polarity of the complementary strand is 3’- TGCTGCGAT- 5’. 
6. Identify the second nucleotide added during 6. The second nucleotide added to the newly synthesized strand is 
DNA replication of the given sequence? adenine, which is complementary to thymine on the template strand. 
TIP: DNA polymerase catalyzes the addition of a 
new nucleotide to the 3’ end of a growing strand. 


For more practice, see Problems 5, 8, 9, 16, and 17. Visit the Study Area to access study tools. MasteringGenetics™ J 


For additional practice, students Genetic Analysis examples include 
are directed to similar problems helpful Tips to highlight critical 
at the end of the chapter. steps and Pitfalls to avoid. 


The accompanying Student Solutions Manual and Study Guide (ISBN 10: 0-13-379558-6) 
provides additional worked problems along with tips for solving problems. It also presents 
solutions to all of the textbook problems in a consistent Evaluate, Deduce, and Solve format 
to complement the approach modeled in the Genetic Analysis examples. 


MasteringGenetics Provides 24/7 Coaching 
in Solving Genetics Problems 


In-depth tutorials, focused on key genetics concepts, reinforce 
problem-solving skills by coaching students with hints and feedback 
specific to their misconceptions. 


Transcription and RNA Processing 


During transcription, RNA polymerase symthasizes RNA from a DNA tamplate with the halp c? accessory pratains In this tutorial. you will revew the steps of ranscaption in eukaryotes and bacteria and 
investigate spicing of mRNAs in eukaryotes, 


Part A - Transcriplion in bacteria 
Tha diagram below shows a length of DNA containing a bacterial gene 
Drag the labels to their appropriate locations in the diagram to describe the function of characteristics of each part of the gene. Not all labels will be used. 


a 
SE ae 


If an incorrect answer is 
submitted, MasteringGenetics 
gives instant feedback 
specific to the error made, 


inverted repeats 
helping students overcome 
- z ona > C) MS 7 m? 
misconceptions and 3 CO ae 7 ms 
P -35 “10 potyadenine 
strengthen problem-solving ®© © sequence 


skills. 


Consensus consensus 
sequence sequence 


can Rens Say tomer 


Incorrect; Try Again; 6 attempts remaining 


You labaled 2 of 5 targtgs incerracthy For (b), that transcription of inverted repasts produces an RNA transcript containing complementary sagmants What thres-dimensional arrangament rasults 
when those sagments bage-pair with each other? 


If students working on a tutorial get stuck, 
they can access hints to get back on track. 


D session.masteringgenetics.com/myct/itemView 2assignmentProblemID=27919A4&hinuD=1 


Hint 1. Specific sequerices in bacterial genes (cick to open) | 


Hint 2. How are the two DNA strands of a gene used dunng transcription? 
A short stratch of a coding strand has tha sequence 5-CGGCTAGAAT.9’ What ara the sequences of the template strand and the RNA transcnat? 
Complete the table by dragging the correct label to the appropriate location. Labels may be used once, more than once, or not at all. 


Tutorial Topics include: 


Pedigree Analysis 

Recombination and Linkage Mapping 

Sex Linkage 

Gene Interactions 

DNA Replication 

Transcription and RNA Processing 

Translation 

Quantitative Genetics 

Genomics: Sequencing and Genome Databases 


...and more! 


NEW! A bank of approximately 140 new practice problems is now available for 
assignments. These questions, only available in MasteringGenetics, include coaching 
and feedback and are not duplicated elsewhere in the end-of-chapter problem sets, 


test bank, Study Area, or solutions manual. 


Use the pedigree below, which shows the inheritance of PFK deficiency in cocker spaniels, to answer the questions 


Henry's parents 


Hailey’s parents 


Part A 


You are interested in becoming a cocker spaniel breeder. You are considering breeding the male dog from Hazel and Hanks litter {e) with one of 
the unaffected female dogs from Hailey and Howey’s litter (h). Hazel's breeder has confirmed that she is not a carrier of the gene for PFK 
deficiency 


What is the probability that their offspring will be carriers of this disorder? 


Express your answer as a fraction (example: 3/5). 
18 


Incorrect; Try Again; 3 attempts remaining 


My Answers Give Up 


Carefully examine the probability that Hailey and Howey’s unaffected female offspring are camers Because they are unaffected, you may 
exclude the aa genotype from consideration. How will this change your results? 


A wide variety of question 
types helps engage students 
with different types of activities, 
including labeling, sorting, 
multiple-choice, short-answer, 
and figure questions. About 

90 percent of the book’s 
end-of-chapter problems 

are now assignable in the 
MasteringGenetics item library. 


Pre-built assignments help 
instructors easily assign questions 
focused on the key ideas of each 
chapter. Curated by experienced 
MasteringGenetics users, these 
“best of" homework assignments 
contain the most frequently 
assigned questions from the 
library. 


NEW! Learning Catalytics is a “bring your own device” assessment and 
classroom activity system that expands the possibilities for student 
engagement. Using Learning Catalytics, you can deliver a wide range of 
auto-gradable or open-ended questions that test content knowledge and 
build critical thinking skills. Eighteen different answer types provide great PTE 
flexibility, including graphical, numerical, textual input, and more. 


MasteringGenetics users may select from Pearson's new library of question 
clusters that explore challenging genetics topics through a series of 2-5 
questions that focus on a single scenario or data set, build in difficulty, and 
require higher-level thinking. 


You responded lo this question; your 
response was 1/2 


U Hide response 


9:41 AM 

Linkage and Probability (1 of 4) 

If Sally's child inherits the clot factor mutant 
allele, what is the chance that this child will 


also inherit the cystic tibrosis mutant allele? 


Express your answer as a fraction. 


# Change response 


New, Up-to-Date Discussions on Genomics, 
Epigenetics and More 


Genomic investigations are rapidly expanding and changing 
what we know about genetics. Coverage of important techniques 
and findings are integrated throughout the text. 


Neandertals and Denisovans in Chapter 22. 


New coverage includes a discussion of the impact of lateral gene transfer on 
bacterial genomes in Chapter 6; a new Experimental Insight of cancer genomics 
in Chapter 12; discussions of new genome methods and analyses in Chapter 18; 
and updated coverage of the human genome, including data on interaction with 


region contains data insufficient to identify its origin. 


Centromere 


Neandertal genomic Neandertal DNA 
sequences not in genomes 
obtained of modern Asians 


rie i 


Neandertal DNA 
in genomes of 
modern Europeans 


Figure 22.18 Thedistribution of Neandertal DNA in the modern human genome. The distribu- 
tion of Neandertal DNA in European and East Asian genomes. Neandertal DNA has been detected in 
all 22 human autosomes. Chromosome 7 shown here is typical. Neandertal DNA is European genomes 
is indicated in the blue and the East Asian genomes is indicated in red. The genome region in the gray 


NEW! Expanded coverage of archaea molecular 
biology is presented in Chapters 7, 8, 9,11, 12, and 
14. These recent advancements in understanding 
the genetics and molecular biology of archaea 
allow insightful comparisons to the genetics of 
bacteria and eukaryotes, particularly in relation to 
molecular genetic processes and to evolution. 


NEW! Revised and expanded coverage of epigenetics shows 
how epigenetics is at the heart of the evolution and regulation 


Transcription 
BRE TATA start site 


-50 -40 -35 -30 -25 -20 -10 +1 
~ 
t 2 A J f Pe > 
-37 -25 
Consensus sequence 


Figure 8.16 Archaea promoter consensus sequences. The 
TATA box and BRE box sequences bind TBP and TFB along with 
RNA polymerase to initiate transcription. 


Epigenetic Heritability 


of gene expression in eukaryotes. Enhanced coverage appears Activating the transcription of an individual gene requires 
in Chapters 11 and 15, including discussions of the histone a confluence of regulatory proteins that remodel or mod- 


code and chromatin states, and on epigenetic readers, writers, 


and erasers. 


ify chromatin to provide enhancer and promoter access to 
transcription factors that initiate and carry out transcript 
synthesis, as we saw above in the detailed description of 
PHOS transcription. Mechanisms controlling differential 
chromatin state formation and maintenance produce pat- 
terns of gene expression in different types of cells that are 
required for the growth and development of complex or- 
ganisms. In a broad sense, these regulatory processes are 
the reason a single fertilized egg can develop and produce 
many distinct types of cells (liver cells, muscle cells, brain 
cells, and so on) that look and act differently even though 
they carry the same genetic information. 

Among the trillions of somatic cells in your body are 
scores of different cell types, and yet all these cells contain 
the same genetic information. The differences of mor- 
phology and function between cell types are genetically 
controlled, as evidenced by the fact that daughter cells 
have the same structures and functions as parental cells, 
but DNA sequence variability is not the reason for those 


Unique, Carefully-Crafted Figures Illustrate 
and Clarify Complex Processes 


Nine Foundation Figures combine visuals and words to help students 
grasp pivotal genetics concepts in a concise, easy-to-follow format. 


Three new Foundation Figures have been added to the Second Edition. 


Fig. 4.22 Epistatic Ratios 

NEW! Fig. 7.14 DNA Replication 

Fig. 7.22 The Trombone Model of DNA 
Replication 

NEW! Fig. 8.6 Bacterial Transcription 

Fig. 8.22 The Gene Expression Machine Model 
for Coupling Transcription with pre-mRNA 
Processing 


FOUNDATION FIGURE 7.14 


DNA Replication 
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NEW! Fig. 9.9 Bacterial Translation Elongation ee aeon Moone 
Fig. 11.6 Condensing the Nuclear Material > aa smug ts 
Fig. 12.25 Molecular Model of Meiotic $ 
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An Integrated Approach to Mendelian 


and Molecular Genetics 


Within a traditional chapter organization, Sanders and Bowman integrate 
transmission genetics and molecular genetics in the text, tables, and figures. 
This approach helps in demonstrating how today’s geneticists think. 


Table 2.6 


Gene 


Table 2.6 identifies the molecular 
characterization of four of the 

pea plant traits Mendel studied. It 
provides a synopsis of the wild-type 
and mutant functions of the four 
known genes. 


Seed shape 
(round and 
wrinkled seeds) 


Theg 


Product 


and Gene 


ene is Sbe7, 


producing starch- 
branching enzyme. 


Identification and Molecular Characterization of Four of Mendel’s Traits 


Wild-Type Allele 
and Function 


The dominant wild-type 
allele (R) produces starch- 
branching enzyme that 

converts amylase, a linear 
starch, into amylopectin, a 
complex branched starch. 


Mutant Allele 
and Function 


The recessive mutant allele 
(r) contains an inserted seg- 
ment about 800 base pairs 
in length. The transcript of 
the mutant allele does not 
produce an enzyme prod- 


Reference 


Bhattacharyya, 
M. K, et al. 1990. 
Cell 60: 115-122. 


uct, resulting in a loss of 
function. 


Stem length The gene is Le, 

(tall and short producing gibberel- 

plants) lin 3B-hydroxylase 
(G3BH). 


G3BH produced by the 
dominant allele Le converts 

a precursor in the synthesis 
of the plant growth hormone 
gibberellin that causes plants 
to grow tall. 


The recessive mutant le 
allele contains a base sub- 
stitution that results in an 
amino acid change. The 
mutant G3BH has less than 
5% the activity of the wild- 
type product and produces 
little gibberellin, leading to 


Lester, D. R., et al. 
1997. Plant Cell 9: 
1435-1443. 


Martin, D.N., et al. 
1997. Proc. Natl. 
Acad. Sci., USA 94: 
8907-8911. 


short plants. 


Experimental Insight essays discuss influential 
experiments, summarize real data derived from the 
experiments, and explain conclusions drawn from the 
analysis of results. NEW! Experimental Insight 12.1 
describes the base substitutions or deletions 
responsible for mutations of three of the Mendel 
genes, and NEW! Experimental Insight 13.2 describes 
the transposition event that is the cause of mutation 
of the fourth gene. 


Experimental Insight 12.1 


Mendel’s Mutations 


Table 2.6 on page 000 and the accompanying text briefly 
describe the wild-type and mutant alleles of the four genes 


produces a very poorly functioning enzyme, largely disabling 
a critical step of chlorophyll breakdown. Consequently, chlo- 


of Mendel that h 
described in this 


Experimental Insight 13.2 


tions and are dej 
described in Sect} 
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function of the al 
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The 2007 studiey 


groups led by la 
molecular basis f| 


and the recessiva 
Icke produces an 
of chlorophyll c 
normally occurs 
results in mature 


The Integration of Genetic 
Approaches: Understanding 
Sickle Cell Disease 


10 


Unique Chapter 10: The Integration of Genetic 
Approaches explores the hereditary and 
molecular basis of sickle cell disease in humans, 
integrating discussions of many research 
techniques. 


Mendel's Peas Are Shaped by Transposition 


Gregor Mendel loft good descriptions, data, and analyses 
of the crosses he used foe establishing the kaw of segrega 
tion and the law of Independent assortment, but he did not 
leave any seeds to give geneticists direct access to the genes 
themselves. Experimental Insight 12.1 identifies three of the 
genes studied by Mendel that have now been identified and 
analyzed, Details of the discovery in 1990 of a fourth gene are 
described here. It is the gene responsible for the round and 
wrinkled seed shapes described by Mendel, now known as 
SBET, the starch branching enzyme I gene. 

The gene was identified and shown to be responsible for 
the seed shape variation Mendel reported by a laboratory 
group led by Cathie Martin (Bhattacharyya et al, 1990), in 
its paper, the group reports westem blot, northern blot, and 
Southern blot evidence that the recessive mutant allele, /, Is 
altered by the insertion of approximately 600 bp of DNA. The 
insertion is of transposable DNA, and its offect is insertional in- 
‘activation of the ability to produce a starch branching enzyme 
that is the normal gene product. The researchers also provide 
a physiclogical explanation for the appearance of wrinkled 
seed shape, 


WESTERN BLOT ANALYSIS 


Prior to the start of this study, considerable evidence already 
suggested that seed shape variation was due to differences 
in starch synthesis. Among candidate enzymes known to be 
important in starch synthesis was SBET, The researchers used 
AR (pure-breeding round} plants as a source of SBE? to raise 
an antibody against the enzyme. They used protein gel elec- 
trophoresis and western blot analysts to test for reactivity be- 
tween the anti-SBEt antibody and proteins extracted from AR 
and rr (pure-breeding wrinkled) plants. The antibody detected 
the enzyme in RA plant protein gels but not in plant protein 
gels @. This indicates that AR plants produce SBEt but that m 
plants do not 


NORTHERN BLOT ANALYSIS 


The researchers next derived a molecular probe for the SRE) 
gene and tested mRNA from AR and rr plants in northern blot 
anatysis. They found that the molecular probe hybridized with 
a 3300-nucleotide MRNA derived from AR plants and with 
a 4100-nudeotide mRNA fram i plants They found as well 
that the larger transcript from: /r plants was about tenfold less 
abundant than the smaller transcript from RA plants €. These 
results indicate that the transcript of S8ET in m plants is ager 


than in 8? plants and that it is produced at just a fraction of 
the percentage present in AR plants. 


ud L 
#100re 
3300 mt -| == 


Nowhem biot 


SOUTHERN BLOT ANALYSIS 


The SBE) gene contains several restricuon sequences, incid- 
ing two for the restriction enzyme EcoRI, The researchers took 
DNA isolated from RA and s plants, digested it with EcoRI, 
and performed DNA gel electrophoresis and Southern blot 
analysis with the SAE? molecular probe. They found that the 
probe hybridized à DNA fragment approximately 2.5 kb in 
Jength from RA plants and a fragment of about 4,3 kb from 
r plants © This result could indicate either the insertion of 
approximately 800 bp of DNA into the 7 allele or the presanco 
of 2 mutation that changes an EcoRI restriction sequence and 
alters the size of the restriction fragment [see Section 10.2], 
Analysis of the DNA sequence of the / allele revealed that the 
larger restiction fragment was created by insertion of DNA 
into ane of the exons of the SRE gene ©, This event causad 
insertional inactivation of the s allele of SBE1_ Additional ex- 
amination of the DNA insert found ìt to be very similar to the 
Ac transposable genetk element identified by McClintock. 
The tansposable DNA element identified by this work is 
named ips- (insertion Pisum sativum). 


4.3 kb 
3.5 kb -| mee 


Southern blot 


WRINKLED SEED DEVELOPMENT 


The physiological explanation of wrinkled seed development 
is tied to the loss of function of SBE1. In mature round peas, 
almost half the dry weight is starch, About 35% of the starch Is 
in a sample linear form known as amylose. The remainder is in 
complexty branched forms, most cammonly a farm known 2s 
amylopectin. Free molecules of sucrose make up about 3% of 
the dry weight Amylose is actively converted to amylopectin 
by SBE! in round seeds. In wrinkled seeds, about 30% of starch 
is amylopectin, and about 70% is amylos, Amylase readily 


Thorough Coverage of Experiments 
and Research Techniques 


Research Technique boxes explore important research methods and visually illustrate the results and interpretations of the techniques. 
NEW! A new Research Technique box on microbial genotyping using growth characteristics has been added to Chapter 6. 


Genotyping Using Microbial Growth 


The results of experiments an microbes described in this chap- 
ter have shaped our understanding of how genes work, in- 
duding huw they are organized and huw they are expressed. 
A hasir set of common laboratory techniques and analyses a 
sessing growth or failure to grow in liquid or sermisalid media 
made up of different components can be used to determine 
the genetic makeup af microeeganisms Proper interpretation 
of the genotype of a microbe based on its pattern of growth 
on different madia is an essential skill of genetic analysis that is 
easy to master ance you understand a few key concepts 


ANABOLIC AND CATABOLIC PATHWAYS Compounds that 
influence the growth of microbes on growth media fall into 
‘two broad categories. In the first are compounds synthesized 
kyy prototrophic wild-type) microbes in biosynthetic pathways 
that are often described as anabolic pathways. in anabolic path- 
ways, energy is used to synthesize complex compounds from 
simpler ones through sequential reaction steps. Figure 4.17 and 
the accompanying discussion of the anabolic pathway that 
synthesizes the amino acid methionine (pages 121-123) pro- 
vide an example. In contrast, catabolic pathways are pathways 
through which energy is produced by the breakdown of complex 
compounds inte simpler ones, Cataibolic pathways aiso fol- 
low sequential steps. Our discussion of phenylketonuria (PKL) 
(pages 121 123} highlights the catabolic pathway that breaks 
dawn the amino acid phenylalanine. SimMarly, compounds such 
a polysaccharide: sugars ke lactose and other carbotyydeatess 
are broken dawn in catabolic pathways. 


VISUALIZING MICROBIAL GROWTH When microbial 
growth occurs on a semisolid growth plate in a petri dish, indi 
vidual colonies may appear on the plate, Fach colony is actually 
hundreds of thausands to millions of individual microbes that 
are all descendant from a single microbial cell among those 
originally spread on the plate in a very dilute solution, Depend 
ing on microbe genotypes and the composition of the growth 
medium, it ts possible that more than one microbial genotype 
is growing on a particular plate, but what ts certain ts that the 
cells in each colony are genetically identical ina liquid growth 
medium, microbial growth produces cloudiness—the result of 
there being so many living cells In the growth vessel that the 
passage of light through the medium is impeded by the cells. 
There are no colonies in liquid media. 

identifying the genotype of a microbe often requires as- 
sessing the growth of a particular colony on different growth 
media. This is accomplished by replica plating. One method 
of replica plating is to simply touch a colony growing on one 
growth medium with a sterile toothpick or a similar instru 
ment ta gather same cells af the colony and then touch a spat 
an à different growth plate, Systematic use of à grid pattern 
en the new plate and care in the recording of growth results 
permit camparison af growth results on different pistes so 
as to identify colony genotypes. An alternative replica plat 
ing methad involves transferring alt the colonies growing on 
ane plate to a new growth plate all at once. A round wooden 


Case Studies are short, real-world examples 
that appear at the end of every chapter and 
highlight central ideas or concepts of the 
chapter to remind students of some of the 
practical applications of genetics. NEW! 
New Case Studies have been added to 


Chapters 1, 3, 5,21, and 22. 


or plastic block slightly smaller in diameter than a peti dish 
and covered with a piece of stenlized velvet ts used for this, 
The velvet-covered block is gently pressed onto the colonies 
of one plate to pèck up some cells from each colony and then 
is used to stamp one or more fresh growth-medium plates. 
Growth results can be compared between plates, and genc- 
types of colonies can be identified because afl the colonies 
ère in the same relative positions on both the original and the 
new plate- 


ALLELIC IDENTIFICATION Distinguishing between com- 
pounds produced by anabolic pathways and those beuker 
down in catabolic pathways is a critical aspect of interpreting 
microbis! growth and identifying microbial genotype that 
requires knowledge of growth media and their constituents, 
As defined in Deperimental Insight 4.1, a minimal medium 
contains glucose as the carbon source, since glycolysis is the 
fundamental energy-producing reaction in many organisms, 
including humans and many microbes. The minimal medhsm 
also contains nitrogen, some inorganic salts, and water, In or 
der to grow on minimal medium, a microbe must synthesize 
every compound it needs for metabolism, DNA replication, 
transcription, and translation. The compounds required to 
carry out these essential functions are the products of ana- 
bolic pathways. Only protorrophs (wild-types) can synthesize 
all the products required for growth on a minimal medium, 
The ability to synthesize an essential compound by comple- 
ton of an anabolic pathway is Indicated in genetic notation 
by 2°!" (plus) symbol and identifies a wild-type allele; thus, 
a microbe capable of blosynthesizing the amino acid methio- 
nine is identified as met’ (spoken “met plus’). In contrast, Live 
* * (minus) symbol indicates the organism in an ounatrayat 
(mutant) that is unable to synthesize è particular compound 
due Lo mulation. The control protatraph shown in Figure 4.19 
(p. 127) is met*, whereas the four other strains are each met”, 
Aurotrophs can also grow on supplemented minimal medium, 
which ìs a minimal medium supplemented with just the spi 
cific campaund or campounds an micxntroph is unable te pec- 
duce on its own, 

In the case of catabolic pethways—alletic symbols identify the 


fa) 


ib) 


25a 7 1 Corrpare complete anc mirimal 
k medum plates 
10 1 


Replica plate 
atl Minimal medium 


Complete medium 


Minimal plus proline (pra) Minimal plus alanine and proline 


fe) 


Compa tomnm. 
Condusor: col 


fe Lo mineral medum piste. 
OIE NS 


Comparing the results of the three supplemented minimal r 


Jia to minimal medium gentifies 
any 2 as an autnteph with an une : 


ype 


Replice plore from 


Lacatose medium Lactose plus alanine and proline 


‘Comparng thi 


wills of The lactose-containing m: 
colanes 4 and 


stop souls identities Ihe prctorte 


ability of a strain to complete a catabolic pathway with a super Colony Genotype Explanation 
senpt +" and the inability to complete a catabolic pathway wth - 
the "—" symbol. For example, microbes that are able to grow ona 15, f,and9 ala pra lac these are protatrophs. Grow on minimal medium and on lactase medium. 
blots eg eile ued leenen Noy i 2 ala” pro” lact Auxotroph. Does not grow on minimal medium. 
> Grows on minimal medium su; ented with both alanine and proline. Also 
aie Paced anni decir tele |- grows on lactose medium ipa toon with alanine and proline, 
media are lac - Ihese strains are unable to produce one or more 3 ala” pro Jac Auxotroph, Doss not grow on minimal medium. 
of the enzymes required for lactose metabolism. Grows on minimal medium supplemented with alanine, Does not grow on lactose 
The accompanying figure guides you through the identi- medium supplemented wth alanine and proline. 
Pe ah ei aea froin ririk esan 4and 10 ala” pro” lac Prototroph. Grows on minimal medium. Does not grow on lactose medium. 
for the ability of the colonies to break duwn lactuse. Genotype 6 ala’ prom tac’ Auxotroph, Does not grow on minimal medium, 
identification is accomplished by comparing grawth on plates Grows on minimal medium plus proline and grows on lactose medium plus 
of media containing different constituents. The accampany- alanine and proline. 
ing lable summarizes the genotype of wach colony arid Lhe = = 
reasoning used to identity the genotype, 8 Unknown genotype Auxotroph. Does not grow on minimal medium. 
(continued! 
CASE STUDY 
GWAS and Crohn's Disease 
Yasunori Ogura and colleagues used GWAS to identify several (a) i 1 2 
chromosome regions associated with Crohn's disease (CD), an 
inflammatory bowel disease that affects humans at a preva- 1 2 
lence of 150 to 200 cases per 100,000 people. The etiology of u 
CD is unknown, but one prominent hypothesis proposes that 
it is an inflammatory response to intestinal bacteria and other 
microflora. TE 
CD clusters in families: Susceptibility to the disease is 500 bp 
inherited but is influenced by multiple genes. The severity of 
CD is highly variable, from relatively mild to potentially fatal. 400 bp 
Clinicians describe CD severity using a scale that captures the Wild-type 
quantitative nature of the trait, making CD a candidate disease jefe (319 bp) 300 bp 
for QTL analysis. In the study by Ogura and colleagues, the 3020insc 
strongest statistical evidence of association ofa genetic marker sete (214 bp) 200 bp 
with a susceptibility gene came from chromosome region 
16q12. A gene initially identified as NOD2 and subsequently 1005p 
renamed CARD15 (caspase recruitment domain, member 15), H OH R R 
is a candidate for a gene influencing susceptibility to CD. KZ Molecular 
Homozygous for/ wild type weight size 
GENE STRUCTURE AND MUTATION CARDIS encodes m nones 
12 exons that direct the production of a 1040-amino acid pro- Heterozygous carriers ladder) 


tein. Ogura and colleagues sequenced the exons and introns 
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Preface 


For genetics researchers, genetics instructors, and the 
students who choose to study genetics, these are wonder- 
ful times to be practicing our craft. The first years of the 
21st century have seen unprecedented expansion of our 
knowledge in genetics. Data on topics that were seem- 
ingly impenetrable just a few years ago are now abundant. 
Novel approaches to old problems have provided profound 
insights on the development and evolution of members of 
all three domains of life. And advancements in genomics, 
proteomics, transcriptomics, and other enterprises of the 
“omic” world have opened avenues for research that were 
unimaginable in years past. The dawn of the 21st century 
was something of a milestone for genetics—it inaugurated 
the second century of genetics. One hundred years after 
the foundational genetic principles of Gregor Mendel were 
rediscovered the genomics era accomplished the major 
feat of completing the human genome sequence. Genetics 
barely seemed to pause to acknowledge this triumph, and 
the field has been “full speed ahead” in its second century. 
New genome sequences are published weekly, and we now 
have not just complete genome sequences of ourselves and 
thousands of other living organisms, but also the genome 
sequences of two archaic human ancestors, Neandertals 
and Denisovans, both of which died out more than 30,000 
years ago. These are great times to be a geneticist or a stu- 
dent studying genetics! 


Our Integrated Approach 


Both the first edition of our textbook and this sec- 
ond edition carry the unique subtitle An Integrated 
Approach. This phrase embodies our pedagogical ap- 
proach that has three principles: (1) integrating problem 
solving throughout the text—not relegating it to the end 
of the chapter—and consistently modeling a powerful, 
three-step problem-solving approach (Evaluate, Deduce, 
and Solve) in every worked example; (2) integrating an 
evolutionary perspective and evolutionary evaluation 
throughout the book; and (3) integrating descriptions of 
Mendelian genetic and molecular genetic analysis de- 
signed to make it clear that these approaches are two sides 
of the same coin—different approaches to investigating 
the same basic sets of observations. In our second edition, 
we adhere to and strengthen the integrated approach that 
has resonated strongly with instructors and students. 


New to This Edition 


The overarching goals that have driven our revision are 
improving student learning, making the job of learn- 
ing genetics easier and more effective for students, and 
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incorporating the new information in genetics that is 
helping to define its future growth. To that end, we 
highlight key new features and information designed to 
accomplish our revision goals. 


Enhanced problem solving Because so many 
students struggle with formulating an approach to 
solving genetics problems, we have added a new 
“Break It Down” component to each of the Genetic 
Analysis worked examples throughout the text. 
“Break It Down” models the concept of breaking 
down problem solving by deciphering the essential 
information needed to start solving the problem. 


Enhanced integration of Mendelian and molecu- 
lar genetics Strong coverage of Mendel’s principles 
of segregation and independent assortment using 
Mendel’s own data is maintained, and more discussion 
of the molecular basis of four identified genes Mendel 
studied has been added. For instance, Table 2.6 
provides a synopsis of the wild-type and mutant func- 
tions of the four known genes; Experimental Insight 
12.1 describes the base substitutions or deletions 
responsible for mutations of three of the genes; and 
Experimental Insight 13.2 describes the transposition 
event that is the cause of mutation of the fourth gene. 


New and expanded Foundation Figures These one- 
or two-page figures combine visuals and words to help 
students master key concepts. These figures were well 
received in the first edition, and we have modified 

and expanded some Foundation Figures and we added 
three new ones to this edition: Foundation Figure 7.14 
DNA Replication; Foundation Figure 8.6 Bacterial 
Transcription; and Foundation Figure 9.9 Bacterial 
Translation Elongation. 


Expanded coverage of archaea molecular biology 
Recent advancements in understanding the genetics 
and molecular biology of archaea—one of three 
domains of life—are described. These recent findings 
allow insightful comparisons to the genetics of bacteria 
and eukaryotes, particularly in relation to molecular 
genetic processes and to evolution. New archaea dis- 
cussions and descriptions appear in Chapters 7, 8, 9, 
11, 12, and 14. 


Extending the integration of evolution throughout 
the text The evolutionary perspective takes an even 
more prominent role in several discussions through- 
out the book, including in discussions of the archaea 
where evolutionary comparisons to bacteria and to 
eukaryotes is a significant component of the discus- 
sion. In addition, Chapter 22 (Population Genetics and 


Evolution at the Population, Species and Molecular 
Levels) has been substantially modified to feature 
additional discussion of natural selection in Darwin’s 
finches, broader discussion of molecular genetic 
support for natural selection, new discussion of the 
evolution of the vertebrate steroid receptor family, 
and new discussion of the Neandertal genome and its 
contributions to the modern human genome. 


Revised epigenetic coverage It is abundantly clear 
that epigenetics is at the heart of the evolution and 
regulation of gene expression in eukaryotes. Coverage 
of epigenetics has been revised in Chapter 11 
(Chromosome Structure), and Chapter 15 (Regulation 
of Gene Expression in Eukaryotes) has been substan- 
tially rewritten to expand coverage of epigenetics and 
to describe new information. Chapter 15’s discussion 
focuses on the histone code and chromatin states and 
on epigenetic readers, writers, and erasers. 


Integrating coverage of genomics throughout 
Genomic investigations are rapidly expanding and 
changing what we know about genetics. Coverage 

of important techniques and findings is integrated 
throughout the text, such as a new discussion of the 
impact of lateral gene transfer on bacterial genomes in 
Chapter 6 (Genetic Analysis and Mapping in Bacteria 
and Bacteriophages); a new Experimental Insight 

of cancer genomics in Chapter 12 (Gene Mutation, 
DNA Repair, and Homologous Recombination); 
discussions of new genome methods and analyses 

in Chapter 18 (Genomics: Genetics from a Whole- 
Genome Perspective); and updated coverage of 

the human genome, including data on interaction 
with Neandertals and Denisovans in Chapter 22 
(Population Genetics and Evolution at the Population, 
Species, and Molecular Levels). 


Enhanced coverage of molecular evolution The 
text’s focus on evolution in genetics now includes 
more coverage of molecular evolution integrated 
into appropriate chapters. Chapters 7 (DNA 
Structure and Replication), 8 (Molecular Biology 

of Transcription and RNA Processing), and 9 (The 
Molecular Biology of Translation) have expanded 
discussions of the evolution of these molecular 
processes. Chapter 11 (Chromosome Structure) 
discusses the evolution of histone proteins in archaea 
and eukaryotes. Chapter 14 (Regulation of Gene 
Expression in Bacteria and Bacteriophage) describes 
evolutionary comparisons of regulatory mechanisms 
in archaea and bacteria. Chapter 15 (Regulation of 
Gene Expression in Eukaryotes) contains expanded 
coverage of the evolution of regulatory functions. 
Chapter 22 (Population Genetics and Evolution at the 
Population, Species, and Molecular Levels) contains 
new discussions of evolution at the population, spe- 
cies, and molecular levels. 
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| New Case Studies Case Studies at the end each chapter 
connect examples of research to central ideas and con- 
cepts in the chapter, reminding students of the practical 
applications of genetics. New Case Studies include: 
The Modern Human Family Mystery (Chapter 1); 
The (Degenerative) Evolution of the Mammalian Y 
Chromosome (Chapter 3); Mapping the Gene for Cystic 
Fibrosis (Chapter 5); and Detecting the Major Gene 
Influencing Crohn’s Disease (Chapter 21). 


New and Updated Coverage 


We revisited each chapter with fresh eyes and helpful 
feedback from users and reviewers of the text. Here are 
some of the highlights of chapter-by-chapter changes in 
the second edition. 


Chapter 1: The Molecular Basis of Heredity, 

Variation, and Evolution 

| New discussion of the role of genomics, proteomics, 
and other “omic” investigative strategies 


New Case Study on the Neandertal genome and 
human-Neandertal genome comparison 
Chapter 2: Transmission Genetics 


New Experimental Insight on plant breeding and 
evolution 


Additional end-of-chapter problems 


Revised and updated coverage of the molecular basis 
of Mendel’s traits 


Chapter 3: Cell Division and Chromosome 
Heredity 


New Genetic Analysis worked example on X-linked 
inheritance 


New Case Study of the evolution of the mammalian 
Y chromosome 


Additional end-of-chapter problems 


Chapter 4: Inheritance Patterns of Single 
Genes and Gene Interaction 
New section on the dominant mutant pattern of mouse 
coat color and recessive lethality of the yellow allele 


Revised discussion of gene interactions in metabolic 
pathways 


Chapter 5: Genetic Linkage and Mapping 
in Eukaryotes 


New section on hotspots and cold spots of recombina- 
tion in genomes 
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I Revisions to sections on correction of map distances 
and the evolutionary favorability of recombination 


I New Case Study of the mapping of the human cystic 
fibrosis (CFTR) gene 


Chapter 6: Genetic Analysis and Mapping 

in Bacteria and Bacteriophage 

i New Research Technique box on microbial genotyping 
using growth characteristics 

i New section on lateral gene transfer and evolution 


New section on identification and assessment of lat- 
eral gene transfer in genomes 


New end-of-chapter problems 


Chapter 7: DNA Structure and Replication 


New Foundation Figure featuring an overview of DNA 
replication 

New material on DNA replication in archaea and com- 
parison of archaeal replication components to those in 
bacteria and eukaryotes 


i New Genetic Analysis worked example on the func- 
tion of critical proteins in DNA replication 


I Discussion of PCR and dideoxy sequencing is 
retained and a new section introducing next genera- 
tion sequencing has been added 


Chapter 8: Molecular Biology of Transcription 
and RNA Processing 


I New Foundation Figure on bacterial transcription 


I New material on transcription in archaea and 
comparisons of archaeal, bacterial, and eukaryotic 
transcription processes and molecules 


I New section on archaea promoters 


I New discussion of the torpedo model of transcription 
termination in eukaryotes 


I New end-of-chapter problems 


Chapter 9: The Molecular Biology 
of Translation 


1 New section on amino acids and polypeptide structures 


i New material on archaeal ribosomes and comparison 
with bacterial and eukaryotic ribosomes 


i New material on archaeal translation initiation 
and comparison with the processes in bacteria and 
eukaryotes 


New Foundation Figure on bacterial translation 
New Genetic Analysis worked example on translation 


i Additional end-of-chapter problems 


Chapter 10: The Integration of Genetic 
Approaches: Understanding Sickle Cell 
Disease 

I New material on the pathophysiology of sickle cell 


disease and on the identification of the molecular basis 
for the condition 


i Additional end-of-chapter problems 


Chapter 11: Chromosome Structure 


I New section on viral structure and viral genomes 


I New Genetic Analysis worked example on detecting 
chromosome variation 

I New section on archaeal chromosomes, the role 
of chromatin in archaea, and the evolutionary 
implications of this new information 


I Additional end-of-chapter problems 


Chapter 12: Gene Mutation, DNA Repair, 
and Homologous Recombination 


I New Experimental Insight describing the molecular 
basis of mutations produced by three of genes studied 
by Mendel—pod color, stem length, and flower color— 
whose mutations result from base substitutions 


I New Experimental Insight on the BROCA system, a 
genome sequence-based assessment of risk for inher- 
ited susceptibility to breast and ovarian cancer 

I Updated discussion of DNA damage repair in bacteria 

and eukaryotes 

New discussion of DNA damage repair and homolo- 

gous recombination in archaea species 

New discussion of the bacterial RecBCD system 

Additional end-of-chapter problems on DNA damage 

repair systems 


I A revised Foundation Figure more clearly explains 
processes at work in meiotic recombination 


Chapter 13: Chromosome Aberrations 
and Transposition 
I New Experimental Insight discussing the molecular 


basis and molecular genetic analysis of Mendel’s round 
and wrinkled seed trait that is caused by transposition 


I Updated discussion of transposition in eukaryotes and 
bacteria 


Chapter 14: Regulation of Gene Expression 
in Bacteria and Bacteriophage 


I New section on transcriptional regulation in archaeal 
species 


I New discussion comparing and contrasting bacterial 
and archaeal transcription regulation and its evolu- 
tionary implications 


Chapter 15: Regulation of Gene Expression 
in Eukaryotes 
i An integrated view of chromatin modification, with a 


focus on how readers, writers, and erasers modulate 
and maintain chromatin architecture 


I A discussion of the roles of long noncoding RNAs 
in gene regulation, using Xist and X-chromosome 
inactivation as an example 


Chapter 16: Analysis of Gene Function by 
Forward Genetics and Reverse Genetics 
I A reorganized discussion of how genes and their 


function are identified via forward and reverse 
genetics 


f A discussion of using genomics approaches to clone 
genes identified via forward genetics 


Chapter 17: Recombinant DNA Technology 
and Its Applications 
I Reorganized presentation of the nuts and bolts of 


recombinant DNA technology and how to construct 
transgenic organisms 


I A discussion of genome editing as a future direction 
of genetics 


Chapter 18: Genomics: Genetics from 

a Whole-Genome Perspective 

i Expanded coverage of copy number variants and their 
origins 

i New Experimental Insight on the human 
microbiome 


i New Genetic Analysis problem on the determination 
of homology, paralogy, and orthology based on inter- 
preting phylogenetic trees 


Chapter 19: Organelle Inheritance and the 
Evolution of Organelle Genomes 
I Provides an up-to-date account of the diversity 


in organelle inheritance in several lineages of 
eukaryotes 


Chapter 20: Developmental Genetics 


Provides in-depth coverage of the genetics of animal 
development and a vignette of how plants are both 
similar but also differ 
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Chapter 21: Genetic Analysis 

of Quantitative Traits 

I New discussion of human GWAS analysis, including 
an introduction to Manhattan plot assessment 

I New Case Study on GWAS analysis of Crohn’s disease 


Chapter 22: Population Genetics and 
Evolution at the Population, Species, 
and Molecular Levels 


I New discussion of convergent evolution of lactase 
persistence in humans 

New Genetic Analysis worked example on determina- 
tion of relative fitness and the operation of natural 
selection in Drosophila 

A new section on contemporary evolution in Darwin’s 
finches 

I Anew section on gene and genome evolution focusing 
on the vertebrate steroid receptor gene family 

I New discussion of the variability and evolution of the 
human genome 

A new Case Study on the evidence for interbreeding 
between Neandertals and modern humans 


I New end-of-chapter problems 


A Problem-Solving Approach 


To help train students to become more effective problem 
solvers, we employ a unique problem-solving feature called 
Genetic Analysis that gives students a consistent, repeat- 
able method to help them learn and practice problem solv- 
ing. Genetic Analysis teaches how to start thinking about 
a problem, what the end goal is, and what kind of analysis 
is required to get there. The three steps of this problem- 
solving framework are Evaluate, Deduce, and Solve. 


Evaluate: Students learn to identify the topic of the 
problem, specify the nature or format of the answer, 
and identify critical information given in the problem. 


Deduce: Students learn how to use conceptual knowl- 
edge to analyze data, make connections, and infer 
additional information or next steps. 


Solve: Students learn how to accurately apply ana- 
lytical tools and to execute their plan to solve a given 
problem. 


Irrespective of the type of problem a student faces, this 
framework guides students through the stages of problem 
solving and gives them the confidence to undertake new 
problems. 

Each Genetic Analysis is organized in a two-column 
format to help students easily follow each enumerated 
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step of the Solution Strategy in the left-hand column 
along with its corresponding enumerated execution 
event of the Solution Step in the right-hand column. We 
enhanced the Genetic Analysis examples by adding 
Break It Down callouts to the problem statement of each 
example. This new element is designed to aid students 
who often struggle with identifying the concepts and 
information contained in a problem that are critical to 
starting the problem-solving process. We also include 
problem-solving Tips to highlight critical steps and Pitfalls 
to avoid, gathered from our teaching experience. It is also 
important to note that Genetic Analysis examples are in- 
tegrated throughout each chapter, right after discussions 
of important content, to help students immediately apply 
concepts they are learning to the context of problem solv- 
ing. Each chapter includes two or three Genetic Analysis 
features, and the book contains 50 in all. 

We pair Genetic Analysis with strong end-of-chapter 
problems that are divided into two groups. Chapter 
Concept problems come first and review the critical 
information, principles, and analytical tools discussed 
in the chapter. These are followed by Application and 
Integration problems that are more challenging and give 
students practice in solving problems that are broader in 
scope. All solutions to the end-of-chapter problems in 
the Study Guide and Solutions Manual use the evaluate- 
deduce-solve model to reinforce the approach. 


An Evolutionary Perspective 


Geneticists are acutely aware of evolutionary relationships 
between genes, genomes, and organisms. Evolutionary 
processes at the organismal level discovered through 
comparative biology can also shed light on the function 
of genes and organization of genomes at the molecular 
level. Likewise, the function of genes and organization of 
genomes informs the evolutionary model. The integration 
of evolution and the evolutionary perspective remains a 
central organizing theme of the second edition, and this 
approach has been greatly enhanced through coverage 
of the molecular biology of archaeal species. Details of 
archaeal processes are described in a context that com- 
pares and contrasts archaea with bacteria and eukaryotes. 


Connecting Transmission 
and Molecular Genetics 


Experiments that shed light on principles of transmission 
genetics preceded the discovery of the structure and func- 
tion of DNA and its role in inherited molecular variation 
by several decades. Yet biologists recognize that DNA 
variation is the basis of inherited morphological variation 
observed in transmission genetics. Understanding how 
these two approaches to genetics are connected is vital to 
thinking like a geneticist. We have retained the integra- 
tion of transmission genetics and molecular genetics in 


the text and have enhanced this feature in two ways: first, 
through additional discussion of the molecular basis of 
hereditary variation, including the mutations that un- 
derlie the four identified genes examined by Mendel, and 
second, with a much more robust genomic approach. 


Pathways Through the Book 


This book is written with a Mendel-first approach that 
many instructors find offers the most effective peda- 
gogical approach for teaching genetics. We are cognizant, 
however, that the scope of information covered in genet- 
ics courses varies and that instructor preferences differ. 
We have kept differences and alternative approaches in 
mind while writing the book. Thus, we provide five path- 
ways through the book that instructors can use to meet 
their varying course goals and objectives. Each pathway 
features integration of problem solving through the inclu- 
sion of Genetic Analysis features in each chapter. 


1. Mendel-First Approach 


Ch 1-22 

This pathway provides a traditional approach that begins 
with Mendelian genetics and integrates it with evolution- 
ary concepts and connects it to molecular genetics. As ex- 
amples, we discuss genes responsible for four of Mendel’s 
traits (Chapter 2), Chapter 12 and Chapter 13, as well as 
gene structure in relation to dominance and functional 
level (Chapter 5). We draw together hereditary variation, 
molecular variation, and evolution in the discussion of 
sickle cell disease (Chapter 10). 


2. Molecular-First Approach 


Ch 1 —> Ch 7-10 — Ch 2-6 — Ch 11-22 

This pathway provides a molecular-first approach to 
develop a clear understanding of the molecular basis of 
heredity and variation before delving into the analysis of 
hereditary transmission. 


3. Integration of Molecular Analysis 


Ch 1 —> Ch 10 —> Ch 2-15 — Ch 16-22 

This pathway focuses on the parallels of transmission and 
molecular genetic analyses right from the start, and it 
best reflects the way a geneticist would approach study of 
the field. We recommend this pathway for students who 
already have a strong genetics background and are famil- 
iar with some common molecular techniques. 


4. Quantitative Genetics Focus 


Ch 1-2 — Ch 21 — Ch 3-20 — Ch 22 

This pathway incorporates quantitative genetics early in the 
course by introducing polygenic inheritance (Chapter 2) and 
following it up with a comprehensive discussion of quantita- 
tive genetics (Chapter 21). 


5. Population Genetics Focus 


Ch 1-2 — Ch 22 —> Ch 3-21 

This pathway incorporates population genetics early in 
the course. Instructors can use the introduction to evolu- 
tionary principles and processes (Chapter 1) and the role 
of genes and alleles in transmission (Chapter 2) and then 
address evolution at the population level and at higher 
levels (Chapter 22). 


Chapter Features 


A principal goal of our writing style and chapter organiza- 
tion is to engage the reader both intellectually and visually 
to invite continuous reading, all the while clearly explain- 
ing complex and difficult ideas. Our conversational tone 
encourages student reading and comprehension, and our 
attractive design and realistic art program visually engage 
students and put them at ease. Experienced instructors 
of genetics know that students are more engaged when 
they can relate concepts to the real world. To that end, we 
use real experimental data to illustrate genetic principles 
and analysis as well as to familiarize students with excit- 
ing research and creative researchers in the field. We also 
discuss a broad array of organisms—such as humans, bac- 
teria, yeast, plants, fruit flies, nematodes, vertebrates, and 
viruses—to exemplify genetic principles. 

Careful thought has been given to our chapter fea- 
tures; each one serves to improve student learning. The 
following features illustrate how we highlight central 
ideas, problems, and methods that are important for un- 
derstanding genetics. 


Genetic Analysis: This is our key problem-solving 
feature that guides students through the problem- 
solving process by using the evaluate-deduce-solve 
framework. 


Foundation Figures: Highly detailed illustrations of 
pivotal concepts in genetics. 


Experimental Insights: Discuss critical or illustrative 
experiments, the data derived from the experiments, 
and the conclusions drawn from analysis of experi- 
mental results. 


Research Techniques: Explore important research 
methods and visually illustrate the results and 
interpretations. 


Case Studies: Short, real-world examples, at the end 
of every chapter, highlight central ideas or concepts of 
the chapter with interesting examples that remind stu- 
dents of some practical applications of genetics. 


MasteringGenetics 


A key reviewing and testing tool is MasteringGenetics, 
the most powerful online homework and assessment 
system available. Tutorials follow the Socratic method, 
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coaching students to the correct answer by offering 
feedback specific to a student’s misconceptions as well 
as providing hints students can access if they get stuck. 
The interactive approach of the tutorials provides a 
unique way for students to learn genetics concepts while 
developing and honing their problem-solving skills. In 
addition to tutorials, MasteringGenetics includes an- 
imations, quizzes, and end-of-chapter problems from 
the textbook. This exclusive product of Pearson greatly 
enhances learning genetics through problem solving, and 
new features include: 


A new category of Practice Problems are like end-of- 
chapter questions in scope and level of difficulty and 

are found only in MasteringGenetics. Solutions are not 
available in the Study Guide and Solutions Manual, and 
the bank of questions extends your options for assigning 
challenging problems. Each problem includes specific 
wrong answer feedback to help students learn from their 
mistakes and to guide them toward the correct answer. 


Nearly 90% of the end-of-chapter questions are now 
included in the item library for assignments. The 
questions use a broad range of answer types in addi- 
tion to multiple choice, such as sorting, labeling, nu- 
merical, and ranking. 


LearningCatalytics is a “bring your own device” 
(smartphone, tablet, or laptop) assessment and active 
classroom system that expands the possibilities for 
student engagement. Instructors can create their own 
questions, draw from community content shared by 
colleagues, or access Pearson's new library of ques- 
tion clusters that explore challenging topics through 
a series of two to five questions that focus on a single 
scenario or data set, build in difficulty, and require 
higher-level thinking. 


Student Supplements 


MasteringGenetics 
ISBN: 0133983501 / 9780133983500 


Study Guide and Solutions Manual 
ISBN: 0133795586 / 9780133795585 


Heavily updated and accuracy-checked by Peter Mirabito 
from the University of Kentucky, the Study Guide and 
Solutions Manual is divided into four sections: Genetics 
Problem-Solving Toolkit, Types of Genetics Problems, 
Solutions to End-of-Chapter Problems, and Test Yourself. 
In the “toolkit,” students are reminded of key terms and 
concepts and key relationships that are needed to solve 
the types of problems in a chapter. This is followed by 
a breakdown of the types of problems students will en- 
counter in the end-of-chapter problems for a particular 
chapter; they learn the key strategies to solve each type, 
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variations on a problem type that they may encounter, 
and a worked example modeled after the Genetic Analysis 
feature of the main textbook. The solutions also reflect 
the evaluate-deduce-solve strategy of the Genetic Analysis 
feature. Finally, for more practice, we've included five to 
10 Test Yourself problems and accompanying solutions. 


Instructor Supplements 


MasteringGenetics 


ISBN: 0133983501 / 9780133983500 

MasteringGenetics engages and motivates students 
to learn and allows you to easily assign automatically 
graded activities. Tutorials provide students with per- 
sonalized coaching and feedback. Using the gradebook, 
you can quickly monitor and display student results. 
MasteringGenetics easily captures data to demonstrate 
assessment outcomes. Resources include: 


In-depth tutorials that coach students with hints and 
feedback specific to their misconceptions. 


An item library of thousands of assignable questions 
including reading quizzes and end-of-chapter problems. 
You can use publisher-created prebuilt assignments to 
get started quickly. Each question can be easily edited to 
match the precise language you use. 


A gradebook that provides you with quick results and 
easy-to-interpret insights into student performance. 


TestGen TestBank 


ISBN: 0133999696 / 9780133999693 

Test questions are available as part of the TestGen EQ 
Testing Software, a text-specific testing program that is 
networkable for administering tests. It also allows instruc- 
tors to view and edit questions, export the questions as 
tests, and print them out in a variety of formats. 


Instructor Resource DVD 


ISBN: 0134005856 / 9780134005850 

The Instructor Resource DVD offers adopters of the 
text convenient access to the most comprehensive and 
innovative set of lecture presentation and teaching tools 
offered by any genetics textbook. Developed to meet the 
needs of veteran and newer instructors alike, these re- 
sources include: 


The JPEG files of all text line drawings with labels in- 
dividually enhanced for optimal projection results (as 
well as unlabeled versions) and all text tables. 

Most of the text photos, including all photos with 
pedagogical significance, as JPEG files. 

A set of PowerPoint’ presentations consisting of a 
thorough lecture outline for each chapter augmented 
by key text illustrations and animations. 


PowerPoint’ presentations containing a comprehen- 
sive set of in-class Classroom Response System (CRS) 
questions for each chapter. 


In Word and PDF files, a complete set of the assess- 
ment materials and study questions and answers from 
the test bank. 


We Welcome Your Comments 
and Suggestions 


Genetics is continuously changing, and textbooks must 
also change continuously to keep pace with the field 
and to meet the needs of instructors and students. 
Communication with our talented and dedicated users 
is a critical driver of change. We welcome all suggestions 
and comments and invite you to communicate with us 
directly. Please send comments or questions about the 
book to us at mfsanders@ucdavis.edu or john.bowman@ 
monash.edu. 
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The Molecular Basis 
of Heredity, Variation, 
and Evolution 


This sculpture of DNA stands in the garden of Clare College Memorial Court 
at the University of Cambridge, England. It was erected to honor the dis- 
covery of DNA structure by Francis Crick and James Watson working at the 
University of Cambridge (Watson lived in Clare College Memorial Court 
during his time in Cambridge), as well as to honor the contributions of 
Rosalind Franklin and Maurice Wilkins working at Kings College, London. 


i is astounding, both in the richness of its history and 
in its diversity. From the single-celled organisms that 
evolved billions of years ago have descended millions of spe- 
cies of microorganisms, plants, and animals. These species 
are connected by a shared evolutionary past that is revealed 
by the study of genetics, the science that explores genome 
composition and organization and the transmission, expres- 
sion, variation, and evolution of hereditary characteristics of 
organisms. 

Genetics is a dynamic discipline that finds applica- 
tions everywhere humans interact with one another and 


CHAPTER OUTLINE 


1.1 Modern Genetics Is in Its Second 
Century 

1.2 The Structure of DNA Suggests 
a Mechanism for Replication 

1.3 DNA Transcription and 
Messenger RNA Translation 
Express Genes 

1.4 Evolution Has a Molecular Basis 
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with other organisms. In research laboratories, 

on farms, in grocery stores, and in medical of- 
fices, courtrooms, and other settings, genet- 

ics plays a prominent and expanding role in 

our lives. Modern genetics is an increasingly 
gene- and genome-based discipline—that is, it is 
increasingly focused on the entirety of the heredi- 
tary information carried by organisms and on the 
molecular circumstances that express genes. Yet 
despite its increasingly gene-focused emphasis, 
genetics retains a strong interest in traditional 
areas of inquiry and investigation—heredity, varia- 
tion, and evolution. Welcome to the fascinating 
discipline of genetics; you are in for an exciting 
and rewarding journey. 

In this chapter, we survey the scope of modern 
genetics and present some basic information about 
deoxyribonucleic acid—DNA, the carrier of genetic 
information. We begin with a brief overview of the 
origins and contemporary range of genetic science. 
Next we retrace some of the fundamentals of DNA 
replication, and of transcription and translation (the 
two main components of gene expression), by re- 
viewing what you learned about these processes 
in previous biology courses, and we introduce the 
most prominent of the modern-day “-omic” avenues 
of research and investigation in genetics. In the final 
section, we describe the central position of evolu- 
tion in genetics and discuss the roles of heredity 
and variation in evolution. 


1.1 Modern Genetics Is in Its Second 
Century 


Humans have been implicitly aware of genetics for more 
than 10,000 years (Figure 1.1). From the time of the do- 
mestication of rice in Asia, maize in Central America, 
and wheat in the Middle East, humans have recognized 
that desirable traits found in plants and animals can 
be reproduced and enhanced in succeeding genera- 
tions through selective mating. On the other hand, ex- 
plicit exploration and understanding of the hereditary 
principles of genetics—what we might think of as the 
science of modern genetics—is a much more recent 
development. 


The First Century of Modern Genetics 


In 1900, three botanists working independently of one 
another—Carl Correns in Germany, Hugo de Vries in 
Holland, and Erich von Tschermak in Austria—reached 
strikingly similar conclusions about the pattern of trans- 
mission of hereditary traits in plants (Figure 1.2). Each 
reported that his results mirrored those published in 
1866 by an obscure amateur botanist and Augustinian 
monk named Gregor Mendel. (Mendel’s work is dis- 
cussed in Chapter 2.) Although Correns, de Vries, and 
Tschermak had actually rediscovered an explanation 
of hereditary transmission that Mendel had published 
34 years earlier, their announcement of the identifica- 
tion of principles of hereditary transmission gave mod- 
ern genetics its start. 

Biologists immediately began testing, verifying, and 
expanding on the newly appreciated explanation of 
heredity. In 1901, William Bateson, an early and vigor- 
ous proponent of “Mendelism,” read a publication by 
a British physician-scientist named Archibald Garrod 
describing the appearance of the hereditary disease al- 
kaptonuria in multiple members of unrelated families. 
Bateson immediately realized that Garrod’s description 
depicted “exactly the conditions most likely to enable a 
rare, usually recessive character to show itself.” Garrod, 
with Bateson’s interpretive assistance, had produced 
the first documented example of a human hereditary 
disorder. 


Localizing the Genetic Material Shortly thereafter, 
Walter Sutton and Theodore Boveri independently 
used microscopy to observe chromosome movement 
during cell division in reproductive cells. They each 
noted that the patterns of chromosome movement 
mirrored the transmission of the newly rediscovered 
Mendelian hereditary units. This work implied that 
the hereditary units, or genes, posited by Mendel 
are located on chromosomes. We now know that 
genes—the physical units of heredity—are composed 
of defined DNA sequences that collectively control 
gene transcription (described later in the chapter) and 
contain the information to produce RNA molecules, 
one category of which is called messenger RNA or 
mRNA and is used to produce proteins by translation 
(described later in the chapter). Chromosomes 
consist of single long molecules of double-stranded 
DNA that in plants and animals are bound by many 
different kinds of protein that give chromosomes their 
structure and can affect the transcription of genes 
the chromosomes carry. The chromosomes of sexually 
reproducing organisms typically occur in pairs known 
as homologous pairs or, more simply, as homologs. 
Each chromosome carries many genes, and homologs 
carry genes for the same traits in the same order on 
each member of the pair. 


(a) (b) 


Bacteria and archaea are single-celled organisms 
that do not have a true nucleus. In almost all cases, spe- 
cies of bacteria and archaea have a single, usually circu- 
lar chromosome. As a consequence, in the genome of 
these organisms, there is just one copy of each gene, a 
condition described as haploid. Bacterial and archaeal 
chromosomes are bound by a relatively small amount 
of protein. Limited amounts of proteins help localize 
bacterial chromosomes to a region of the cell known as 
the nucleoid. Some archaeal species have chromosomes 
that have associated proteins that make them appear 
to be similar to bacterial chromosomes, but other spe- 
cies appear to have a more eukaryote-like chromosome 
organization. 

In contrast, bacteria and archaea, the cells of 
eukaryotes—a classification that includes all single- 
celled and multicellular plants and animals—contain 
a true nucleus that permanently sequesters multiple 
sets of chromosomes. Almost all eukaryotes have hap- 
loid and diploid stages in their lifecycles. For example, 
sperm and eggs produced in animals are haploid, having 
one copy of each chromosome pair in the genome. In 
the diploid state, the eukaryotic genome contains two 


(a) Carl Correns (b) Hugo de Vries 
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Figure 1.1 Ancient applications of 
genetics. (a) An early record of human 

genetic manipulation is this Assyrian relief 
from 882-859 gce. It shows priests in bird masks 
artificially pollinating date palms. (b) Modern 
maize (left) is thought to have developed 
through human domestication of its wild 
ancestor teosinte (right). 


copies—a homologous pair—of each gene. (Although, 
even in a diploid state, genes located on eukaryotic sex 
chromosomes might not be present in two copies, as we 
describe in Chapter 4.) Numerous eukaryotic genomes, 
particularly those of plants, contain more than two cop- 
ies of each chromosome—a genome composition known 
as polyploidy. 

In addition to the chromosomes carried in their 
nuclei—the so-called nuclear chromosomes—plant and 
animal cells also contain genetic material in special- 
ized organelles called mitochondria, and plant cells 
contain a third type of gene-containing organelle 
called chloroplasts. Many of these organelles are pres- 
ent by the dozens in each cell, and each mitochon- 
drion or chloroplast carries one or more copies of 
its own chromosome. Mitochondrial and chloroplast 
genes produce proteins that work with protein pro- 
duced by nuclear genes to perform essential functions 
in cells—mitochondria are essential for the production 
of adenosine triphosphate (ATP) that is the principal 
source of cellular energy, and chloroplasts are necessary 
for photosynthesis. Mitochondria and chloroplasts are 
transmitted in the cytoplasm during cell division, and 


Figure 1.2 Early 20th century 
genetic theorists. (a) Carl Correns, 
(b) Hugo de Vries, and 

(c) Erich von Tschermak 
simultaneously rediscovered 

the experiments and principles 
of Gregor Mendel in 1900. 


(c) Erich von Tschermak 
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the term cytoplasmic inheritance is used to identify the 
random distribution of mitochondria and chloroplasts 
among daughter cells. 

Mitochondria and chloroplasts have an evolutionary 
history, having descended from ancient parasitic bacte- 
rial invasion of eukaryotic cells. Since the time of their 
acquisition by eukaryotes, mitochondria and chloro- 
plasts have evolved an endosymbiotic relationship with 
their eukaryotic hosts, and the precise genetic content of 
mitochondria and chloroplasts varies by eukaryotic host 
species (see Chapter 19). 

A complete set of nuclear chromosomes are trans- 
mitted during the cell-division process called mitosis to 
produce genetically identical daughter cells. In contrast, 
sexual reproduction to produce offspring occurs by the cell- 
division process called meiosis that produces reproductive 
or sex cells, often identified as gametes—sperm and egg 
in animals and pollen and egg in plants. The gametes of a 
diploid species are haploid and contain one chromosome 
from each of the homologous pairs of chromosomes in the 
genome. The union of haploid gametes at fertilization pro- 
duces a diploid fertilized egg that begins mitotic division to 
produce the zygote. 

Predictable patterns of gene transmission during 
sexual reproduction are a focus of later chapters that 
discuss hereditary transmission and the analysis of trans- 
mission ratios (Chapter 2), cell division and chromo- 
some heredity (Chapter 3), gene action and interaction 
of genes in producing variation of physical appearance 
(Chapter 4), and the analysis of genetic linkage between 
genes (Chapter 5). 

Genetic experiments taking place in roughly the 
first half of the 20th century developed the concept of 
the gene as the physical unit of heredity and revealed 
the relationship between phenotype, meaning the ob- 
servable traits of an organism, and genotype, meaning 
the genetic constitution of an organism. Biologists also 
described how hereditary variation is attributable to 
alternative forms of a gene, called alleles. The alleles of 
a gene have differences in DNA sequence that alter the 
product of the gene. 

During the early decades of the 20th century, the 
study of gene transmission was established as a foundation 
of genetics. The concepts of gene action and gene interac- 
tion in producing phenotype variation were described, as 
was the concept of mapping genes along chromosomes. 
It was also during this period that evolutionary biologists 
developed gene-based models of evolution. These, too, 
are integral to genetic analysis, and their use continues to 
the present day. 


Identifying the Genetic Material An experiment 
conducted in 1944 by Oswald Avery, Colin MacLeod, and 
Maclyn McCarty identified deoxyribonucleic acid (DNA) 
as the hereditary material and is commonly credited with 
inaugurating the “molecular era” in genetics (see Chapter 7). 


This new era, which spanned the second half of the 20th 
century and continues to the present day, began an effort 
to discover the molecular structure of DNA. This research 
reached a milestone in 1953, when the experimental work of 
many biologists, including, most famously, James Watson, 
Francis Crick, Maurice Wilkins, and Rosalind Franklin, led 
to the identification of the double-helical structure of DNA. 
A few years later, in 1958, the common mechanism of DNA 
replication was ascertained. By the mid-1960s, the basic 
mechanisms of DNA transcription and messenger RNA 
(mRNA) translation were laid out, and the genetic code by 
which mRNA is translated into proteins was deciphered. 
Gene cloning and the development of recombinant DNA 
technologies developed and progressed rapidly during the 
1970s. By the early 1980s, biologists realized that to properly 
understand the unity and complexity of life, they would 
have to study and compare the genomes of species, the 
complete sets of DNA sequences, including all genes and 
regions controlling genes. This realization launched the 
“genomics era” in genetics, which continues to expand 
rapidly today. 

Since the inception of genome sequencing, biologists 
deciphered thousands of genomes that range in size from 
a few tens of thousands of DNA base pairs in the simplest 
viral genomes to tens of billions of base pairs in the largest 
plant and animal genomes. Fittingly, in 2001, a century 
after Garrod and Bateson’s historic identification of al- 
kaptonuria as a human hereditary disease, collaborative 
scientific groups from around the world published the 
completed “first draft” of the human genome. Collective 
efforts like the Human Genome Project and the other 
genome sequencing projects that have been and will be 
undertaken promise to provide databases that will make 
the second century of genetics every bit as remarkable as 
its first century. 


Genetics—Central to Modern Biology 


One of the foundations of modern biology is the dem- 
onstration that all life on Earth shares a common origin 
in the form of the “last universal common ancestor,” or 
LUCA (Figure 1.3). All life is descended from this com- 
mon ancestor and is most commonly divided into three 
major domains. These three domains of life are Eukarya, 
Bacteria, and Archaea. 

The three-domain model of life is originally de- 
rived from the research of Carl Woese and colleagues 
in the mid-1970s. In contrast to earlier models, which 
were based on morphology alone, Woese used molecu- 
lar sequences to determine phylogenetic relationships 
between existing organisms and thus to trace the evo- 
lution of life. Woese used the sequence of ribosomal 
RNA (rRNA), a small molecule produced directly from 
DNA in all organisms, as his basis for comparison. His 
premise was simple—evolutionary theory predicts that 
closely related species will have more similarity in their 
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Figure 1.3 The three domains of life. 
The last universal common ancestor 
(LUCA) gave rise to three domains of life. 
Endosymbiosis between Eukarya and 
Bacteria led to mitochondria (blue) and 
chloroplasts (green) populating eukaryotic 
cells. 
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rRNA sequences than will species that are less closely 
related. Furthermore, species that are members of the 
same evolutionary lineage will share certain rRNA se- 
quence changes that are not shared with species outside 
the lineage. Since Woese’s work, many researchers have 
used other molecules to refine and propose additional 
details to the three-domain model. The tree of life re- 
mains a work in progress, but the three-domain model 
is well established. We use this model in subsequent 
chapters to compare and contrast molecular features, 
activities, and processes to shed additional light on the 
evolutionary relationships between the three domains. 
A second foundation of biology is the recognition 
that the hereditary material—the molecular substance that 
conveys and stores genetic information—is deoxyribo- 
nucleic acid (DNA) in all organisms. Certain viruses use 
ribonucleic acid (RNA) as their hereditary material. Most 
biologists argue that viruses are not alive. Rather, they are 
obligate intracellular parasites that are noncellular and 
must invade host cells where they reproduce at the expense 
of the host cell. In living organisms, DNA has a double- 
stranded structure described as a DNA double helix, or as 
a DNA duplex, consisting of two strands joined together 
in accordance with specific biochemical rules. Certain viral 
genomes consist of a small single-stranded DNA molecule 
that replicates to form a DNA duplex in a host cell. 
Eukarya, Bacteria, and Archaea share general mecha- 
nisms of DNA replication, the process that precisely 
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duplicates the DNA duplex prior to cell division, and 
they also share general mechanisms of gene expression, 
the processes through which the genetic information 
guides development and functioning of an organism. All 
organisms express their genetic information by a two- 
step process that begins with transcription, a process in 
which one strand of DNA is used to direct the synthesis 
of a single strand of RNA. Transcription produces vari- 
ous forms of RNA, including messenger RNA (mRNA), 
which in all organisms undergoes translation to produce 
proteins at structures called ribosomes. 

As the biological discipline devoted to the exami- 
nation of all aspects of heredity and variation between 
generations and through evolutionary time, genetics is 
central to modern biology. Modern genetics has three 
major branches. Transmission genetics, also known 
as Mendelian genetics, is the study of the transmis- 
sion of traits and characteristics in successive genera- 
tions. Evolutionary genetics studies the origins of and 
genetic relationships between organisms and exam- 
ines the evolution of genes and genomes. Molecular 
genetics studies inheritance and variation in nucleic 
acids (DNA and RNA), proteins, and genomes and tries 
to connect them to inherited variation and evolution in 
organisms. 

These branches of genetics are not rigidly differenti- 
ated. There is substantial cross-communication among 
them, and it is rare to find a geneticist today who doesn’t 
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use analytical approaches from all three. Similarly, not 
only are most biological scientists, to a greater or lesser 
extent, also geneticists, but many of the methods and 
techniques of genetic experimentation and analysis are 
shared by all biological scientists. After all, genetic analysis 
interprets the common language of life by integrating in- 
formation from all three branches. 


1.2 The Structure of DNA Suggests 
a Mechanism for Replication 


At its core, hereditary transmission is the process of dis- 
persing genetic information from parents to offspring. In 
sexually reproducing organisms, this process is accom- 
plished by the generation of reproductive sex cells in males 
(the sperm or pollen) and females (the egg), followed by 
the union of egg and sperm (animals) or pollen (plants) or 
spores (yeast) at fertilization, with the subsequent develop- 
ment of an organism. DNA is the hereditary molecule in re- 
productive cells. Similarly, in somatic (body) cells of plants 
and animals and in organisms that reproduce by asexual 
processes, DNA is the hereditary molecule that ensures 
that successive generations of cells are identical. 

Experiments and research on cells taking place from 
the late 1800s through the mid-1900s culminated in the 
identification of DNA as the hereditary material (see 
Section 7.1). This identification was of monumental im- 
portance to biologists and biochemists and was the foun- 
dation of new molecular-focused approaches in biological 
science research. Understanding the molecular structure 
of DNA was key to two fundamental areas of inquiry: 
(1) how DNA could carry the diverse array of genetic 
information present in the various genomes of animals 
and plants and (2) how the molecule replicated. In this 
section, we review basic concepts of DNA structure and 
DNA replication. The molecular details of DNA structure 
and replication are provided in Chapter 7. 


The Discovery of DNA Structure 


In the early 1950s, James Watson, an American in his 
mid-20s who had recently completed a doctoral degree, 
and Francis Crick, a British biochemist in his mid-30s, 
began working together at the University of Cambridge, 
England, to solve the puzzle of DNA structure. Their 
now-legendary collaboration culminated in a 1953 publi- 
cation that ignited the molecular era in genetics. 

Watson and Crick’s paper accurately described the 
molecular structure of DNA as a double helix composed 
of two strands of DNA with an invariant sugar-phosphate 
backbone on the outside and nucleotide bases—adenine, 
thymine, guanine, and cytosine—arrayed in complemen- 
tary base pairs that orient themselves toward the center of 
the molecule. This discovery was of enormous importance 
because with the structure of DNA unveiled, the “gene” 


had a physical form and was no longer just a conceptual 
entity. In this physical form, genes could be examined and 
sequenced, compared with other genes in the genome, 
and compared with similar genes in other species. 

Watson and Crick’s description of DNA structure was 
not the product of their work exclusively. In fact, unlike 
others who made significant contributions to the discov- 
ery of DNA structure, Watson and Crick were not actively 
engaged in laboratory research. Outside of their salaries, 
they had very little financial support available to conduct 
research. In lieu of laboratory research, Watson and Crick 
put their efforts into DNA-model building, basing their 
interpretations on experimental data gathered by others. 

Rosalind Franklin, a biophysicist working with Maurice 
Wilkins at King’s College in London, was one of the prin- 
cipal sources of information used by Watson and Crick. 
Franklin used an early form of X-ray diffraction imagery to 
examine the crystal structure of DNA. In Franklin’s method, 
X-rays bombarding crystalline preparations of DNA were 
diffracted as they encountered the atoms in the crystals 
(Figure 1.4). The pattern of diffracted X-rays was recorded 
on X-ray film, and the structure of the molecules in the 
crystal was deduced from that pattern. Franklin’s most 
famous X-ray diffraction photograph clearly shows (to the 
well-trained eye) that DNA is a duplex, consisting of two 
strands twisted around one another in a double helix. 

In devising their DNA model, Watson and Crick com- 
bined Franklin’s X-ray diffraction data with information 
published a few years earlier by Erwin Chargaff. Chargaff 
had determined the percentages of the four DNA nucleo- 
tide bases in the genomes of a wide array of organisms 
and had concluded that the percentages of adenine and 
thymine are approximately equal to one another and that 
the percentages of cytosine and guanine are equal to one 
another as well (Table 1.1). Known as Chargaff’s rule, 
this information helped Watson and Crick formulate the 
hypothesis that DNA nucleotides are arranged in comple- 
mentary base pairs. Adenine, on one strand of the double 


(a) (b) 


Figure 1.4 X-ray diffraction evidence of DNA structure. 
(a) This X-shaped pattern is consistent with the diffraction of 
X-ray beams by a helical molecule composed of two strands. 
(b) Rosalind Franklin obtained this X-ray diffraction result. 
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Table 1.1 | Nucleotide-Base Composition of Various Genomes 


Source 
Genome Percentage of Each Nucleotide Base Ratios 
Adenine Guanine Cytosine Thymine 

(A) (G) (C) (T) Gt+C G/c 
Bacteria 
E. coli (B) 23.8 26.8 263 231 53.1 1.02 
Yeast 
S. cerevisiae B13 18.7 Zl 32.9 35.8 1.09 
Fungi 
N. crassa 23.0 27.1 26.6 233 537 1.02 
Invertebrate 
C. elegans 312 19.3 20.5 29.1 39.8 0.94 
D. melanogaster 20:3 22:5 22.5 27.6 45.0 1.00 
Plant 
A. thaliana 29.1 20.5 20.7 29.7 41.2 0.99 
Vertebrate 
M. musculus 29.2 21.7 19.7 29.4 41.4 1.10 
H. sapiens 30.6 19.7 19.8 303 39.5 0.99 


helix, pairs only with thymine on the other DNA strand, DNA Nucleotides 
and cytosine pairs only with guanine to form the other base 
pair. With these data, their own knowledge of biochemis- 
try, and their analysis of incorrect models of DNA struc- ; f 
ture, Watson and Crick built a table-top model of DNA out 2 five-carbon deoxyr ibose sugar, a p hosp hate 5roup, 
of implements and materials scattered around their largely and one of four n ogen-containing nucleotide bases, 
inactive research laboratory space—wire, tin, tape, and designated adenine (A), guanine (G), thymine (T), 


paper, supported by ring stands and clamps (Figure 1.5). and cytosine (C) (Figure 1.6). The nucleotides form- 
ing a strand are linked together by a covalent phos- 


phodiester bond between the 5’ phosphate group 
of one nucleotide and the 3’ hydroxyl (OH) group of 
the adjacent nucleotide. Phosphodiester bonding leads 
to alternation of deoxyribose sugars and phosphate 
groups along the strand and gives the molecule a sugar- 
phosphate backbone. 

The nucleotide bases are hydrophobic (water-avoiding) 
and naturally orient toward the water-free interior of the 
duplex. The bases can occur in any order along one strand 
of the molecule, but DNA is most stable as a duplex of 
two strands that have complementary base sequences, so 
that an A on one strand faces a T on the second strand 
and a G on one strand faces a C on the other. This com- 
plementary base pairing is the basis of Chargaff’s rule and 
produces equal percentages of A and T and of C and G 
in double-stranded DNA molecules. Hydrogen bonds, 
noncovalent bonds consisting of weak electrostatic at- 
tractions, form between complementary base pairs to join 
the two DNA strands into a double helix. Each strand of 
DNA has a 5’ end and a 3’ end. These designations refer 
to the phosphate group (5’) and hydroxyl group (3’) at 
the opposite ends of each strand of DNA and establish 
Figure 1.5 James Watson (left) and Francis Crick (right) in strand polarity, that is, the 5’-to-3’ orientation of each 
1953 with their cardboard-and-wire model of DNA. strand. Complementary strands of DNA are antiparallel, 


Each strand of the double helix is composed of DNA 
nucleotides that have three principal components: 
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Figure 1.6 DNA composition and structure. DNA nucleotides contain a deoxyribose sugar, a 
phosphate group, and a nucleotide base (A, T, G, or C). Phosphodiester bonds join adjacent nucleo- 
tides in each strand, and hydrogen bonds join complementary nucleotides of strands that have 


antiparallel orientation. 


meaning that the polarities of the complementary strands 
run in opposite directions—one strand is oriented 5’ to 
3’ and the complementary strand is oriented 3’ to 5’. 
Genetic Analysis 1.1 guides you through a problem that 
tests your understanding of base-pair complementation 
and complementary strand polarity. 

If you are like many biology students, you have proba- 
bly wondered from time to time what DNA actually looks 
like, both on the macroscopic and microscopic level. Even 
today’s best microscopes have difficulty capturing high- 
resolution images of DNA, although computer-aided 
techniques for analyzing molecular structure can produce 
an interpretation of its microscopic appearance, as you'll 
see in Chapters 7, 8, and 9, for example. However, you do 
not need sophisticated instrumentation to produce a sam- 
ple of DNA that you can hold in your hand. Experimental 
Insight 1.1 presents a simple recipe for DNA isolation 
you can do at home with common and safe household 
compounds. 


DNA Replication 


The identification of the double-helical structure of DNA 
established a starting point for a new set of questions about 
heredity. The first of these questions concerned how DNA 


replicates. After correctly describing DNA structure in 
their 1953 paper, Watson and Crick closed with a directive 
for future research on the question of DNA replication: “It 
has not escaped our notice that the specific base-pairing 
we have proposed immediately suggests a possible copying 
mechanism for the genetic material.” 

Indeed, as a consequence of the A-T and G-C com- 
plementary base-pairing rules, it was evident that each 
single strand of DNA contains the information necessary 
to generate the second strand of DNA and that DNA 
replication generates two identical DNA duplexes from 
the original parental duplex during each replication cycle. 
At the time Watson and Crick described the structure 
of DNA, however, the mechanism of replication was 
not known. It would take another 5 years for Matthew 
Meselson and Franklin Stahl, in an ingenious experiment 
of simple design, to prove that DNA replicates by a semi- 
conservative mechanism (see Chapter 7). 

In semiconservative replication, the mechanism by 
which DNA usually replicates, the two complementary 
strands of original DNA separate from one another, and 
each strand acts as a template to direct the synthesis of 
a new, complementary strand of DNA with antiparallel 
polarity. The mechanism is termed “semiconservative” 
because after the completion of DNA replication, each 
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Figure 1.7 Semiconservative DNA replication. Each 
parental DNA strand serves as the template for synthesis of its 
daughter strand. DNA polymerase synthesizes daughter strands 
one nucleotide at a time. 


new duplex is composed of one parental strand (con- 
served from the original DNA) and one newly synthesized 
daughter strand (Figure 1.7). 

DNA replication begins at an origin of replication, 
with the breaking of hydrogen bonds that hold the strands 
together. (This process is much like what happens when 
a zipper comes undone.) DNA polymerases are the en- 
zymes active in DNA replication. Using each parental DNA 
strand as a template, these enzymes identify the nucleo- 
tide that is complementary to the first unpaired nucleotide 
on the parental strand and then catalyze formation of a 


phosphodiester bond to join the new nucleotide to the previ- 
ous nucleotide in the nascent (growing) daughter strand. 

The biochemistry of nucleic acids and DNA polymer- 
ases dictates that DNA strands elongate only in the 5’-to-3' 
direction. In other words, nucleotides are added exclusively 
to the 3’ end of the nascent strand, leading to 5'-to-3' 
growth. Like the parental duplex, each new DNA duplex 
contains antiparallel strands. Each parental strand—daughter 
strand combination forms a new double helix of DNA that is 
an exact replica of the original parental duplex. 


1.3 DNA Transcription and Messenger 
RNA Translation Express Genes 


The central dogma of biology is a statement describing 
the flow of hereditary information. It summarizes the crit- 
ical relationships between DNA, RNA, and protein; the 
functional role that DNA plays in maintaining, directing, 
and regulating the expression of genetic information; and 
the roles played by RNA and proteins in gene function. 
Francis Crick proposed the original version of the central 
dogma, shown in Figure 1.8a, in 1956 to encapsulate the 
role DNA plays in directing transcription of RNA and, 
in turn, the role messenger RNA plays in translation of 
proteins. As Crick told the story years later, he wrote this 
concept as “DNA — RNA —> protein” (spoken as “DNA 
to RNA to protein”) on a slip of paper and taped it to the 
wall above his desk to remind himself of the direction 
of information transfer during the expression of genetic 
information. The most important idea it conveys is that 
DNA does not code directly for protein. Rather, DNA 
makes up the genome of an organism and is a permanent 
repository of genetic information in each cell, directing 
gene expression by the transcription of DNA to RNA and, 
ultimately, the production of proteins. 

Over the decades since Crick first introduced the 
central dogma, biologists have developed a clear un- 
derstanding of the role of DNA in maintaining and 
expressing genetic information. Most of the details of 
the two-stage process by which genetic information 
in sequences of DNA is transcribed to RNA and then 
translated to protein are known, as described in later 
chapters (transcription in Chapter 8 and translation in 
Chapter 9). For example, biologists now know that sev- 
eral forms of RNA are found in cells, and all these RNA 
molecules are transcribed and play a variety of roles in 
cells, but only mRNA is translated. 

Two important categories of RNA that are not 
translated but nonetheless play critical roles in transla- 
tion are ribosomal RNA and transfer RNA. Ribosomal 
RNA (rRNA) forms part of the ribosomes, the plentiful 
cellular structures where protein assembly takes place. 
Transfer RNA (tRNA) carries amino acids, the build- 
ing blocks of proteins, to ribosomes. An updated central 


GENETIC ANALYSIS 


PROBLEM Determine the sequence and polarity of the DNA strand complementary to the strand shown below. 


BREAK IT DOWN: ADNA sequence is 3o ES BREAK IT DOWN: Complementarity of DNA 
ae of A, G, T, and C nucleotides thats a 3 ’-..ACGGATCCTCCCTAGTGCGTAATACG...- 5 ig om pairs A with T and G with C (p. 6) 


on one end and 3’ on the other (p. 7) 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic of this problem and 
the kind of information the answer 
should contain. 

2. Identify the critical information given 
in the problem. 


1. This problem concerns nucleotide complementarity in a DNA duplex and the 
polarity of complementary strands. The answer should contain the nucleotide 
sequence and polarity of a strand complementary to the given one. 

2. The problem provides the nucleotide sequence and polarity of one strand of a 
DNA duplex. 


Deduce 


3. Recall the base-pairing 
relationships of DNA nucleotides in 
complementary strands. 


PITFALL: Always check the polar- 


ity of a strand you are given; don’t 
assume it's written with either the 
5’ or 3’ end facing a certain way. 


antiparallel, with one strand 3’ —> 5’ and 
the other 5’ 3’. 


k Complementary DNA strands are 


4. Recall the polarity relationship of 
complementary DNA strands. 


Solve 


5. Give the sequence and polarity of the 
complementary DNA strand. 


For more practice, see Problems 11, 12, and 14. 


Visit the Study Area to access study tools. 


3. In complementary DNA strands, base pairing joins adenine with thymine and 
guanine with cytosine to form a DNA duplex. 


4. The second strand of this duplex will be oriented with its 5’ end to the left and 
its 3’ end to the right. 


5. By the rules of complementary base pairing and antiparallel strand orientation, 
the second DNA strand is 


5 ’- TGCCTAGGAGGGATCACGCATTATGC- 3 ’” 


dogma of biology is shown in Figure 1.8b. In addition to 
mRNA, rRNA, and tRNA, the figure identifies reverse 
transcription, a form of information flow that synthe- 
sizes DNA from an RNA template in RNA-containing 
viruses (retroviruses) by using an enzyme called reverse 
transcriptase. It also identifies micro-RNA (miRNA), the 
focus of a rapidly emerging new area of RNA investiga- 
tion that studies the role of these small RNA molecules 
in the regulation of gene expression in plants and ani- 
mals (see Chapter 15). 


Figure 1.8 The central dogma of (a) 
biology. (a) Francis Crick’s original 

central dogma of biology. (b) The DN 
updated central dogma of biology. 
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Transcription is the process by which information in DNA 
sequence is converted into RNA sequence. Transcription 
uses one strand of the DNA making up a gene to direct 
synthesis of a single-stranded RNA transcript. The DNA 
strand from which the transcript is synthesized is called 
the template strand. The RNA-synthesizing enzyme RNA 
polymerase pairs template-strand nucleotides with com- 
plementary RNA nucleotides to synthesize new transcript 
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Experimental Insight 1.1 


Countertop DNA Isolation—Try This at Home! 


For all the abundance of DNA in cells, its molecular structure is 
too small to see without the aid of the most powerful electron 
microscopes. However, that doesn’t mean DNA must remain 
invisible to the naked eye. The key to seeing it is simply a ques- 
tion of volume. If enough DNA is collected together, it can be 
seen—although not, of course, in its molecular detail. Using a 
rich source of DNA (such as onions, which are available year- 
round, or strawberries, whose nuclei contain eight copies of 
each chromosome) and a few familiar household items, you 
can collect a visible sample of DNA in about 30 minutes. 


INGREDIENTS 

1 small peeled onion (about 1 cup) or about 1 cup strawber- 
ries with leaves removed 

1 to 2 cups water with 1 teaspoon of dissolved salt per cup 
2 tablespoons dishwashing liquid 

1 tablespoon meat tenderizer (containing “papain” from 
papaya) 

4 to 6 ounces isopropyl (“rubbing”) alcohol (95% is best, but 
70% is sufficient) 


EQUIPMENT 

Food processor (for onion) or a potato masher or ricer 

(for strawberries) 

Small bowl 

Clear glass jar or container with vertical sides 

Cheesecloth to layer over the top of the glass container with 
a few inches to spare all around 

1 rubber band to go around the glass container 

1 chopstick or a similar wooden implement 


in the 5'-to-3’ direction; the transcript is antiparallel to the 
DNA template strand (Figure 1.9). 

The complementary partner of the DNA template 
strand is known as the coding strand. In the past, the 
coding strand has also been identified as the “nontemplate 


Direction of transcription > 


RNA polymerase 

Coding strand 

> (nontemplate 
SCTCA 3’ strand) 

©GAGT_ 5’ Template 

4 strand 


MRNA The DNA coding strand and the 


mRNA transcript have the same 
polarity and sequence, substituting 
U in mRNA forT in DNA. 


Figure 1.9 The correspondence of RNA to DNA template 
and coding strands. 


DIRECTIONS 


1. Peel onion and finely chop in food processor or thor- 
oughly mash strawberries in bowl. 

2. Add 1 to 2 cups water to onion and process into a fine 
slurry. Pour slurry into small bowl. If using strawberries, 
add about 1 cup water and mash into a fine slurry. 

3. Add 2 tablespoons liquid dishwashing soap to slurry 
and stir gently. Be careful not to let the soap get foamy. 
Let mixture stand at least 10 to 15 minutes (longer is 
fine) while the soap breaks down the cell and nuclear 
membranes. 

4. Add 1 tablespoon meat tenderizer to mixture, stir gently, 
and let stand at least 10 to 15 minutes (longer is fine). The 
papain will digest much of the protein released by the 
ruptured cells and also the proteins attached to DNA. 

5. Place 2 to 3 layers cheesecloth loosely over the opening 
of the glass container, allowing the cloth to form a small 
“bowl” inside the opening. Use the rubber band to hold 
the cheesecloth in place. Pour the slurry mixture through 
the cheesecloth, scooping out the onion or strawberry 
debris as it fills the cheesecloth bowl. Approximately 8 to 
12 ounces of “juice” will collect at the bottom of the con- 
tainer. Discard the cheesecloth and its contents. 

6. Pour the alcohol into the juice and stir very briefly. Let 
the juice mixture stand for at least 5 to 10 minutes. As the 
juice settles, the alcohol rises to the top, and the large 
mass of floating cottony material in it is DNA. 

7. When the alcohol has completely separated from the 
juice, you can “spool” the DNA onto a chopstick by slowly 
twirling the stick in the cottony DNA. 


strand,” but that term is rarely used anymore. Because 
the coding strand is both complementary and antiparal- 
lel to the DNA template strand, it has the same 5' — 3’ 
polarity as the RNA transcript synthesized from the tem- 
plate strand; moreover, the RNA transcript and the DNA 
coding strand are identical in nucleotide sequence, except 
for the appearance of U in the place of T. Our descriptions 
in this textbook will refer to this DNA strand as the “cod- 
ing strand,” but it is also correct to identify the strand as 
the nontemplate strand. 

RNA is composed of four nucleotides that are chemi- 
cally very similar to DNA. RNA nucleotides consist of a 
ribose sugar (as opposed to deoxyribose found in DNA), 
a phosphate group, and one of four nitrogenous bases. 
Three of the RNA nucleotide bases are adenine, cytosine, 
and guanine. They are identical to the same nucleotide 
bases found in DNA. The fourth RNA base is uracil (U). 
It is chemically closely related to thymine; thus, in DNA- 
RNA and in RNA-RNA complementary base pairing, 
uracil pairs with adenine. All other complementary base- 
pair arrangements are as we described them previously. 
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Transcription is the process in which the enzyme 
RNA polymerase uses the template strand of DNA to 
synthesize RNA transcripts. To begin transcription, RNA 
polymerase, and any other proteins necessary for tran- 
scription, must locate a gene and gain access to the tem- 
plate DNA strand by interacting with DNA sequences 
that control transcription. Once the coding sequence of 
the gene has been transcribed, the RNA polymerase must 
stop transcription and release the transcript. 

Promoters are the most common type of DNA 
sequences controlling transcription. Promoters are 
recognized by RNA polymerase, and they direct RNA 
polymerase to a nearby gene. Promoters themselves are 
regulatory sequences and are not transcribed. Instead, the 
transcription of a gene begins near the promoter at the 
start of transcription, the DNA location where transcrip- 
tion of a sequence begins. Transcription ends at the termi- 
nation sequence, where another DNA sequence facilitates 
the cessation of transcription (Figure 1.10a). In bacteria 
and archaea, protein-producing genes are transcribed into 
mRNA that is quickly translated to produce the protein. 
Eukaryotic genes have a different structure than do bacte- 
rial and most archaeal genes. Nearly all eukaryotic genes 
are subdivided into exons, which contain the coding infor- 
mation that will be used during translation, and introns, 
which intervene between exons and are removed from the 
transcript before translation (Figure 1.10b). Bacterial genes 
do not contain introns, and only a tiny number of archaeal 
genes are suspected to contain introns. The removal of 
introns from eukaryotic mRNA and other modifications 
before translation occurs in the nucleus (see Chapter 8). 


Translation 


Translation converts the genetic message of mRNA into 
sequences of amino acids using the genetic code. The 
amino acids are joined to one another by a covalent bond 


Figure 1.10 Gene structure in bacteria, (a) 
archaea, and eukaryotes. Coding sequences 
contain information to be transcribed into 

RNA. Promoter sequences regulate the DNA y 3 | 
initiation of transcription, and termination 

sequences control the cessation of transcription. 
(a) Bacterial and most, but not all, archaeal 
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5 | | a | |) 3 Coding strand 


Start of ~ : 
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called a peptide bond. The resulting string of amino acids 
is a polypeptide, which upon folding makes up all or part 
of a protein. 

Translation of mRNA occurs at ribosomes, where sets 
of three consecutive nucleotides, each set called a codon, 
specify the amino acid at each position of a polypeptide. Each 
mRNA codon is a triplet of RNA nucleotides coded by three 
complementary DNA nucleotides on the template strand. 
The DNA nuceotides complementary to codon nucleotides 
are known as the DNA triplet (Figure 1.11a). Translation 
begins with mRNA attaching to a ribosome in a manner that 
places the start codon, the codon specifying the first amino 
acid of a polypeptide, in the necessary location (Figure 1.11b). 
The start codon is most commonly AUG and is the codon at 
which translation begins. The start codon is read by the ribo- 
some in the 5’ — 3’ direction, A then U then G. To read each 
subsequent codon, the ribosome moves 5’ —> 3’ along the 
mRNA to assemble the amino acid string. 

Amino acids are transported to ribosomes by trans- 
fer RNAs (tRNAs). At each codon, complementary base 
pairing occurs between codon nucleotides and a three- 
nucleotide sequence of tRNA called an anticodon. This 
interaction assembles amino acids in the order dictated 
by the mRNA sequence. Ribosomal proteins power the 
continuous progression of the ribosome along mRNA and 
catalyze peptide bond formation in the growing polypep- 
tide chain. Translation continues until the ribosome en- 
counters a stop codon, thus bringing translation to a halt. 

The genetic code, through which mRNA codons 
specify amino acids, was deciphered by a series of ex- 
periments that took place during the early 1960s. The 
experiments revealed that the genetic code contains 
64 codons; every codon consists of three positions that 
are each filled by one of the four RNA nucleotides. An 
mRNA codon is read in the 5’-to-3’ direction: The first 
base of the codon is at its 5’ end, the third base is at its 3’ 
end, and the second base is in the middle. 
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(a) 
DNA 
Coding strand 5'//ATG ACA CTG Gal ACG CIT TAA//[3’ 
Template strand 3’///TACTGT GAC CCA TGC GAA ATT//\5’ 
DNA triplet: 1 2 3 4 5 6 7 


mRNA 5’ //KUIG) ACA CUG GGU ACG CUU [ENS// 3 


Codon: 1 2 3 4 5 6 7 
Polypeptide —(METUTHR LEU UGLY UTHR LEU 
Amino acid sequence: 1 2 3 4 5 6 STOP 


(b) 


— Amino acid 


Polypeptide ` 
Peptide bond 
some | 
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Figure 1.11 Overview of translation. (a) Messenger RNA 
codons are complementary and antiparallel to DNA triplets of 
the template strand. (b) Ribosomes initiate translation of mRNA 
at the start codon and move along the mRNA in the 3’ direction, 
adding a new amino acid to the nascent polypeptide by read- 
ing each codon. Transfer RNA molecules carry amino acids to 
ribosomes, where the tRNA anticodon sequences interact with 
codon sequences of mRNA. Translation terminates when the 
ribosome encounters a stop codon. 


A total of 61 of the 64 codons specify amino 
acids, and the other 3 are the stop codons. The 64 
codons are displayed in Table A (inside the book front 
cover) using the three-letter and one-letter abbrevia- 
tions. Table B (also inside the book front cover) lists 
the names and abbreviations of each amino acid, along 
with their codons. The genetic code is redundant, with 
individual amino acids encoded by as many as six co- 
dons and as few as one codon. 

Genetic Analysis 1.2 allows you to work through the 
transcription and translation of the DNA sequence as- 
sessed in Genetic Analysis 1.1. 


Genomes, Proteomes, and “-omic” 
Approaches 


Genomics is the field that focuses on the sequencing, inter- 
pretation, and comparison of genomes of different organ- 
isms. Genomic data collection and analysis involve an array 
of molecular techniques and analytical strategies that aid in 
identification and examination of the totality of the DNA in 
a cell, nucleus, or organelle (mitochondria and chloroplasts) 
carried by a species. Indeed, genomics has made critical 
contributions to many areas of biological investigation. From 
medicine to the study of hereditary variation to the study of 
evolution, genomic data are proving critically important. 

Much has changed in DNA sequencing since it began 
in the 1980s. Genome sequencing is accomplished today 
by automated high-throughput methods, so-called next- 
generation sequencing that is thousands of times faster, and 
far cheaper, than the original genome sequencing methods 
(see Chapters 7, 18, and 22 for details and applications). 

To date, thousands of genome sequences have been 
compiled. Among the smallest genomes are those of viruses, 
mitochondria, and chloroplasts, which generally contain 
tens of thousands to a few hundred thousand base pairs. In 
contrast, the largest sequenced genomes are those of some 
plant species that carry multiple sets of chromosomes from 
their progenitors and have billions of base pairs. Genome 
sizes are usually identified in terms of megabases (Mb), 
with 1 Mb equal to 1 million base pairs. 

Certain selected species known as “model organisms” 
are commonly used in genetics and genomics experiments. 
They are selected because their biology is well known, they 
are easy to work with and propagate, and they can be in- 
vestigated through multiple experiments and thus be seen 
from a more complete perspective. A reference table inside 
the book back cover provides genomic and other critical 
information about nine model organisms, including the 
bacterium E. coli, the plant Arabidopsis thaliana, the yeast 
Saccharomyces cerevisiae, the fruit fly Drosophila melano- 
gaster, and humans (Homo sapiens). 

Genomics has a seemingly limitless array of applica- 
tions. For example, genomic techniques and analyses 
can be used to identify specific genes, to identify allelic 
variants producing hereditary diseases, to map genes, to 
identify regions of genomes that increase or decrease the 
likelihood of an organism expressing a particular trait, to 
compare gene sequences within and among species, to 
trace the evolution of genes, and to identify the evolution- 
ary relationships between related organisms. 

The Human Genome Project, completed in 2000, 
was a landmark achievement that, by producing the nu- 
cleotide sequence of an entire representative human ge- 
nome, set a new course for the genetic investigation of 
humans. In so doing, it made some striking discoveries. 
For example, 45% of the human genome consists 
of transposable genetic elements. These are mobile 
DNA sequences that can move throughout the genome 


GENETIC ANALYSIS 


PROBLEM The DNA duplex identified in Genetic Analysis 1.1 is 


3°- ACGGATCCTCCCTAGTGCGTAATACG...-5” A E IT DOWN: The coding strand has the same | 


r ; 5’ —>3’ polarity as the mRNA and also the same base sequence 
5 ’-... TECCTAGGAGGGATCACGCATTATGEC...- 3 except for the presence of uracil (U) instead of thymine (T) (p. 12). 


One strand of the double-stranded DNA sequence serves as the coding strand and the other as the 
template strand that is transcribed to produce an mRNA. The mRNA is translated into a coun AUG, aay IT DOWN: Translation uses mRNA codons (three 


containing five amino acids, the first of which is methionine (Met), encoded by the start codon AUG. _| consecutive mRNA nucleotides) to direct the assembly 
The mRNA also contains a stop codon. of polypeptides (strings of amino acids) (p. 12). 


a. Identify the DNA coding strand and the nucleotides corresponding to the start soon itat IT DOWN: The start codon is AUG, | 


x : and it is followed by four more codons and then a sto 
amino acid codons, and the stop codon. y P 


codon (p. 12). 
b. Write the sequence and polarity of the mRNA transcript, showing the codons 
for the five amino acids and the stop codon. 


c. Write the amino acid sequence of the polypeptide produced, using both the three-letter and 
one-letter codes for the sequence. (See the genetic code tables inside the front cover). 


Solution Strategies Solution Steps 


— BREAK IT DOWN: Messenger RNA codons are written 
and translated 5’ to 3’ using the genetic code, which contains 
three stop codons, UAA, UAG, and UGA (inside front cover). 


Evaluate 

1. Identify the topic of this problem and 1. The problem concerns identification of the coding strand of DNA and the sequence 
the kind of information the answer of mRNA encoding five amino acids in a polypeptide. The amino acid sequence is 
should contain. also required. 

2. Identify the critical information given 2. The double-stranded DNA sequence is given. It contains a sequence correspond- 
in the problem. ing to the start codon (AUG), encodes five amino acids, and contains a stop codon. 

Deduce 

3. Scan the double-stranded DNA 3. The double-stranded DNA sequence contains two possible triplets corresponding 
sequence to identify possible DNA to start codons (5'-ATG-3’), one on each strand. Each is highlighted here in bold: 
coding-strand triplets and triplets that 5 *-TECCTAGGAGGGATCACGCATTATGC-3’ 


might be a start codon. 3 *- ACGGATCCTCCCTAGTGCGTAATACG-5 ’ 


PITFALL: Don't simply read left to TIP: The start codon in mRNA is 5’-AUG-3’ 
right. Instead, identify strand polarity (methionine), coded by the template-DNA 


and read 5’ 3’. strand triplet 5'- ATG -3’. 


4. Scan the double-stranded DNA to 4. Four DNA triplets potentially correspond to a stop codon. Each corresponding 
identify possible DNA coding-strand stop codon is shown in bold type below. 
triplets corresponding to possible 3 '- ACGGATCCTCCCTAGTGCGTAAATCG-5 * 
stop codons. 5 ’~.. TECCTAGGAGGGATCACGCATTATGC...- 3’ 
“a There are three stop codons, UAA, soe | 
and UGA, corresponding to DNA coding-strand 
triplets TAA, TAG, and TGA, respectively. 
Solve Answer a 
5. Determine which 5’-ATG-3’ DNA 5. The potential start codon in the upper strand to the right (5’-ATG-3’) corresponds 
triplet that is followed by four additional to the authentic start codon (AUG). The following 12 nucleotides correspond to 
codons (12 nucleotides) encoding the amino acid codons and the stop codon (5’-TAG-3’, which corresponds to the 
amino acids and then by a stop codon UAG stop codon of mRNA). 
corresponds to the authentic start TIP: The total length of this region 
codon. l would be 18 nucleotides. 
Answer b 
6. Determine the mRNA sequence and 6. The mRNA sequence is 
polarity, showing the codons. 5’-AUG CGU GAU CCC UCC UAG-3’ 
Start Stop 
Answer c 


7. Determine the amino acid sequence 7. The polypeptide encoded by this mRNA is Met-Arg-Asp-Pro-Ser, or M-R-D-P-S. 
of the polypeptide encoded by this 
mRNA. 


For more practice, see Problems 15, 16, and 19. Visit the Study Area to access study tools. MasteringGenetics™ 


(see Section 13.7). It also showed that almost 26% of the 
genome consists of noncoding introns, and only 1.5% of 
the genome consists of protein-coding exons. Section 
18.1 provides additional details of the content and 
genetic annotation of the human genome. 

Genome sequencing and analysis are not limited to 
living species. Several extinct species have recently had 
their genomes sequenced for comparison to those of liv- 
ing relatives. These species include the mastodon (for 
comparison to the elephant), the quagga (for comparison 
to the zebra), and two extinct lineages of early humans, 
Neandertals and Denisovans (for comparison to the mod- 
ern human genome). We look at the interesting results of 
the Neandertal-Denisovan—Homo sapiens genome com- 
parisons in the Case Study that concludes the chapter. 

On the heels of genomic sequencing, additional are- 
nas of “-omic” investigations and analyses have developed. 
Transcriptomics, the study of the transcriptome, the 
complete set of genes that undergo transcription in a given 
cell, allows researchers to investigate and compare differ- 
ent cell types to identify differences in the genes that are 
transcribed there, to characterize changes in the levels 
of gene transcription within a single cell type, or to see 
how biological changes affect transcription. Such studies 
can make important contributions to the understanding 
of biological abnormalities in cancer by identifying the 
genes whose transcription is either increased or decreased 
in cancer cells versus normal cells (see the Case Study in 
Chapter 12). Along the same lines, metabolomics, the 
study of chemical processes involving metabolites, exam- 
ines metabolic processes and outcomes in specific cells, 
tissues, organs, and organisms. Metabolomic comparisons 
of related organisms ties directly to genomics through 
shared genetic ancestry, and it can also reveal new genetic 
adaptations that have altered metabolism is organisms. 

Proteomics, the study of the proteome, the com- 
plete set of proteins encoded in a genome, examines the 
functions of proteins, their localization, their regulation, 
and their interactions in a comprehensive way. In other 
words, rather than analyzing the structure and function 
of individual proteins and looking one by one for interact- 
ing partners, proteomics is a methodology for examining 
large numbers of proteins at once. Multiple techniques are 
used to collect and analyze the proteomes of organisms. 
Among the numerous applications for proteomics are the 
use of proteomic analysis to decipher complex networks 
of protein—protein interaction in cells to find the number 
and types of such interactions there (see Section 18.1). 

Each of these “-omic” approaches has its own goals, but 
collectively they also share a common goal—to contribute 
to the comprehensive understanding of complex biological 
systems. Called systems biology, this comprehensive ap- 
proach to understanding biological complexity has become 
possible through the development and the incorporation 
of genomics, proteomics, transcriptomics, and metabolo- 
mics. One overarching goal of the biological sciences—to 
which genetics is a principal contributing discipline—is to 
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understand the normal and abnormal biology of organisms 
in a comprehensive way through systems biology. 

Applied to humans, for example, systems biology 
aims to understand how cells work in health and disease, 
to explain the details of how a single cell develops into 
a complete organism, and even to explain phenomena 
as complex as learning, memory, personality, and the 
development of personality disorders. These enormously 
complex attributes of organisms result in part from net- 
works of interactions between genes, proteins, metabo- 
lites, and environmental influences. They are the most 
challenging aspects of modern biology, requiring both 
the understanding of genetic principles and analysis and 
the use and application of new tools and technologies for 
data collection and assessment. This is the exciting and 
dynamic world in which modern genetics operates. 


1.4 Evolution Has a Molecular Basis 


As biologists survey varieties of life, assess the genetic 
similarities and differences between species, and explore 
the relationship of modern organisms to one another and 
to their extinct ancestors, it becomes apparent that all life 
is connected through DNA. Richard Dawkins, a biologist 
and author of several books on evolution, made note of 
this molecular connection, observing that life “is a river 
of DNA, flowing and branching through geologic time.” 
Dawkins’s “river through time” connecting all organisms 
is DNA. This shared DNA is a basis for identifying and 
studying relationships between organisms and tracing 
their evolutionary histories. 

Life is not static or uniform, of course; it evolves as 
DNA diverges into separate “branches” whose metaphorical 
forking leads to new species. The Dawkins quote suggests 
that for heredity to maintain genetic continuity across gen- 
erations and for variation to develop between organisms 
and evolve new species, the biochemical processes that 
replicate DNA and express the genetic information must 
also be universal. From this perspective the universality 
of DNA as the hereditary molecule of life, the shared pro- 
cesses of DNA replication and transcription, and the use of 
the same genetic code by all life are consistent with the idea 
of a single origin of life that has evolved into the millions of 
species inhabiting Earth today as well as other millions that 
preceded them but are now extinct. 

Life on Earth originated from a single source during the 
Archaean Eon that lasted from 4 billion to 2.5 billion years 
ago. In 2011, an international group of scientists led by 
David Wacey discovered fossils of a sulphur-metabolizing 
single-celled organism in 3.49-billion-year-old rocks from 
Western Australia (Figure 1.12). At that time in Earth’s his- 
tory there was very little oxygen present, and the first living 
organisms, likely not much different from those identified 
in fossil form, metabolized sulphur-containing compounds 
for growth. Organisms with similar metabolism exist today 
around hot springs and thermal vents. 
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Figure 1.12 The earliest fossils on Earth. These single-celled 
sulphur-metabolizing organisms are fossilized in 3.49-billion-year- 
old rocks in Western Australia. 


These early life-forms have given rise to a dazzling 
array of species, most now extinct. Some of those extinct 
ancestors, however, gave rise to modern species that in- 
habit every conceivable ecological niche on Earth, from 
the most temperate to the most extreme. 


Darwin’s Theory of Evolution 


Over the millennia since life originated, untold millions 
of species have come and gone, through the operation of 
shared processes that faithfully replicated their DNA and 
passed it on to the next generation while also allowing for 
the accumulation of variation that drives diversification. 
This variation, the changes life has undergone, is explained 
by the theory of evolution, which says that all organisms 
are related by common ancestry and have diversified over 
time. The four widely recognized evolutionary processes 
are described below, but first some general comments on 
Charles Darwin’s theory of evolution by natural selection. 
This view of evolution was proposed separately and 
independently by both Charles Darwin and Alfred Wallace 
in the late 1850s. Both authors based their proposals on 
firsthand observations of the distribution and diversity of 
life across the globe. Each author described higher rates of 
survival and reproduction of certain forms of a species over 
alternative forms through the process of natural selection 
that favors the survival and reproduction of the most fit 
individuals in each generation. Unlike the other processes 
we describe in this overview of evolution, natural selec- 
tion works at the phenotypic level, but like all evolutionary 
processes, its effectiveness is based on underlying genetic 
variation. Natural selection operating to favor one mor- 
phological form over others increases the frequency of the 
favored form in the population and, by doing so, increases 
the frequencies of the alleles controlling the favored form. 
Over many generations, forms that produce more off- 
spring also leave more copies of the alleles that control the 


phenotype, creating the hallmark of evolutionary change— 
change in the genetic makeup of the population. 

Charles Darwin’s theory of evolution by natural selec- 
tion is now a firmly established scientific fact incorporating 
three principles of population genetics that were obvious to 
many naturalists in Darwin’s day but were not assembled 
into a coherent model until Darwin articulated their connec- 
tion in his 1859 publication The Origin of Species by Means of 
Natural Selection. Darwin's union of observation and prin- 
ciples into an evolutionary theory had a revolutionary effect 
on biology and laid the foundation of the modern biological 
sciences. Darwin’s principles of populations are 


1. Variation exists among the individual members of 
populations with regard to the expression of traits. 


2. Hereditary transmission allows the variation in traits 
to be passed from one generation to the next. 


3. Certain variant forms of traits give the individuals 
that carry them a higher rate of survival and repro- 
duction in particular environmental conditions. 
These organisms leave more offspring and increase 
the frequency of the variant form in the population. 


Yet while Darwin laid out the general process by which 
species evolved, he never understood the underlying hered- 
itary mechanisms that allowed the process to occur. Today, 
however, more than 150 years after Darwin introduced his 
revolutionary proposal, biologists fully understand the role 
of genetics in evolution. With regard to Darwin’s evolu- 
tionary principles, biology has established that 


1. Phenotypic variation of expressed traits reflects in- 
herited genetic variation. DNA-sequence differences 
(allelic variation) must be the cause of phenotypic 
variation if evolution is to occur. 


2. Hereditary transmission of phenotypic variation re- 
quires that offspring inherit and express the alleles that 
were responsible for the variation in parental organisms. 


3. Organisms carrying alleles that are favored by natural 
selection have a reproductive advantage over organ- 
isms that do not carry favored alleles. The former 
group therefore leaves more copies of their alleles in 
the next generation, causing the population to evolve 
through a change in allele frequency. 


In other words, progressive phenotypic change in a popu- 
lation is paralleled by genetic changes. 

In this particular process of evolution—evolution by 
natural selection—one form reproduces in greater numbers 
than others in a population because of being better adapted 
to the conditions driving natural selection. This process, 
also known as adaptive evolution, is common; but many ex- 
amples of so-called nonadaptive evolution (or neutral evolu- 
tion), the evolution of characteristics that are reproductively 
or functionally equivalent to other forms in the population, 
are also observed. Nonadaptive traits are neutral with re- 
spect to natural selection, conferring neither a selective 
advantage nor a selective disadvantage to their bearer, yet 


their evolutionary basis is fundamentally the same as that of 
adaptive evolution, as the following paragraphs attest. 


Four Evolutionary Processes 


The foundations of evolutionary genetics (which, you will 
recall, studies and compares genetic changes in populations 
and species over time) were established in the first four 
decades of the 20th century by several notable evolution- 
ary biologists and innumerable lesser-known individuals. 
Interestingly, this work took place before DNA was identi- 
fied as the hereditary material and before the chemical struc- 
ture of genes was defined and understood. Ronald Fisher, 
Sewall Wright, J. B. S. Haldane, and many others devised 
mathematical and statistical models of gene frequency dis- 
tribution and evolution in populations and species, leading 
to evolutionary hypotheses that have been tested and verified 
countless times in laboratory and natural populations. 

Through this massive body of work, evolutionary 
biology has confirmed Darwin’s model of the evolution 
of species by natural selection and expanded the descrip- 
tion of evolution to include three additional processes. 
Thus, biologists identify four processes of evolution, each 
leading to changes in the frequencies of alleles in a popu- 
lation over time, a hallmark characteristic of evolutionary 
change. The four evolutionary processes are 


1. Natural selection—the differential survival and 
reproduction of members of a population owing to 
possession of favored traits. Population members with 
the best-adapted morphological form are best able to 
survive and reproduce, and they leave more offspring 
than those possessing less-adaptive forms. Over time, 
the frequency of the best-adapted form and the alleles 
that produce it increase in the population. 


2. Migration—the movement of individual organisms 
from one population to another. This migratory 
movement transfers alleles from one population to an- 
other, and if the allele frequencies between the popu- 
lations are different and if the number of migrating 
individuals is large enough, migration can rapidly alter 
allele frequencies. 


3. Mutation—the slow acquisition of inherited variation 
that increases the diversity of populations and serves as 
the “raw material” of evolutionary change. Mutation, 
occurring in many different ways in genomes, provides 
the genetic diversity that is essential for evolution. 


4. Genetic drift—the random change of allele frequen- 
cies due to chance in randomly mating populations. 
Genetic drift occurs in all populations, but it is most 
pronounced in very small populations, where statis- 
tically significant fluctuations in allele frequencies 
can occur from one generation to the next. 


By the middle of the 20th century, the modern synthesis 
of evolution—the name given to the merging of evolutionary 
theory with the results of experimental, mathematical, and 
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molecular population biology—emerged as a unified view of 
evolution. The modern synthesis tells the story of morpho- 
logical and molecular evolution of plant and animal species 
using experimentally verified processes and mechanisms. 
Among the best-known principal architects of the mod- 
ern synthesis are Theodosius Dobzhansky and Ernst Mayr, 
who drew together ideas from Darwin, Fisher, Wright, 
Haldane, and others to demonstrate how evolution oper- 
ates in real populations. Dobzhansky and Mayr profoundly 
influenced the thinking and research of generations of bi- 
ologists by demonstrating that evolutionary events revealed 
by laboratory investigations and in natural populations are 
consistent with the predictions made by Fisher, Wright, and 
Haldane. In simple terms, Dobzhansky and Mayr showed 
that evolution in populations and evolution in species oc- 
cur as predicted by evolutionary theory. Today, having 
been fleshed out by the work of countless researchers, the 
modern synthesis gives a clear and virtually complete pic- 
ture of the factors that produce the evolutionary changes in 
populations and of the mechanisms that produce the evolu- 
tion of species. We incorporate evolutionary examples into 
many chapters and also have a chapter devoted specifically 
to evolution in species and in populations (see Chapter 22). 


Tracing Evolutionary Relationships 


Evolutionary biologists investigate evolution by studying 
morphological (physical) and molecular (DNA, RNA, and 
protein) evolution of populations and organisms. Both 
morphological and molecular comparisons can be used 
to identify relationships between living species and to 
reveal ancestor—descendant relationships. These similari- 
ties and differences can be depicted in a diagram called 
a phylogenetic tree, a branching diagram that describes 
the ancestor—descendant relationships among species or 
other taxa. The tree of life shown in Figure 1.3 is one type of 
phylogenetic tree. These trees summarize the evolutionary 
histories of species by using branching points in the tree to 
represent the common ancestors of descendant organisms. 

The most commonly used approach to phylogenetic 
tree construction is the cladistic approach, which depicts 
species’ evolutionary relationships by sorting the species 
into groups called clades, or monophyletic groups, based 
on shared derived characteristics, or synaptomorphies, 
either morphological or molecular. Synaptomorphies are 
shared by organisms that are members of a clade. Such 
sharing of traits is interpreted to indicate that the common 
ancestor shared by clade members also possessed the trait. 
Synaptomorphies, whether they are of body morphology, 
proteins, or nucleic acid sequence, occur through homology, 
the presence of the trait or sequence in a common ances- 
tor. An example morphological homology is limb structure 
in vertebrates. The limbs of humans, horses, bats, and seals 
have different functions, but they share the same underly- 
ing structure in terms of the number and arrangement of 
bones in the limbs. These similarities are due to the com- 
mon ancestry of vertebrates. 
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Figure 1.13 Morphological evolution. A phylogenetic 
tree based on morphological and other characteristics shows 
the apparent evolutionary relationships between 14 species of 
finches inhabiting the Galapagos Islands. 
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In some instances, closely related taxa fail to share 
a particular trait even though they have a close com- 
mon ancestor. The branch of a phylogenetic tree miss- 
ing a particular trait or sequence is identified as a 
paraphyletic group. Paraphyletic groups include some 
but not all the descendants of a single common ances- 
tor. Paraphyletic groups frequently occur when one 
lineage of a related group of taxa loses a trait that is 
retained by descendants or when one lineage develops 
a new trait not found in other descendants of the com- 
mon ancestor. 

In some apparent cases of synaptomorphy, the simi- 
larities are not a result of sharing a close common ancestor. 
Instead, convergent evolution has led unrelated organisms 
to display similar-looking traits. Such instances are known 
as homoplasmy. One example of homoplasmy is the 
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presence of wings in birds and bats. These wings—despite 
the similarities brought about by convergent evolution— 
have independent origins. 

Figure 1.13 shows a phylogenetic tree for 14 finch spe- 
cies that inhabit the Galapagos Islands. These finch species 
were one of the groups studied by Darwin as he formulated 
his evolutionary theory. The tree shown here is based on 
a variety of morphological and behavioral characteristics, 
including the beak shape, beak size, feeding habits, and 
habitat of each species, as well as its degree of isolation or 
separation from other species in the Galapagos Islands. 


Constructing Phylogenetic Trees Using Morphology 
and Anatomy Consider the features shared by various 
animals listed in Figure 1.14. One common morphological 
feature common to all these animals is the presence of a 
backbone. This feature unites these animals into a clade 
we know as vertebrates that all share a common vertebrate 
ancestor. A second morphological feature, the presence 
of four legs, unites all the tetrapod animals and excludes 
salmon. Thus, all the animals except the salmon can be 
united into a clade we call tetrapods. Because fish are not 
within the clade of tetrapods, they form an outgroup to 
tetrapods. An outgroup is a taxon or group of taxa that is 
related to, but not included within, the clade in question. 
The species within the clade of interest are called the 
ingroup. In our example, each successive clade is identified 
by grouping species based on other shared characteristics. 
After a phylogenetic tree has been constructed, it 
may be used to infer the characters of ancestral species. 
For example, we can infer that the common ancestor 
of all the taxa in Figure 1.15 had a backbone, which 
would therefore be an ancestral character; but it did not 
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Figure 1.14 The identification of clades based on morpho- 
logical characters. Organisms are assessed for the presence or 
absence of a series of morphological characters and those that 
share derived characteristics form clades. The origins of specific 
traits can be traced on the phylogenetic tree. 


have four legs, which in this case would be a derived 
character that evolved later, in the common ancestry of 
tetrapods. 


Constructing Phylogenetic Trees Using Molecules 
Phylogenetic trees based on molecular characteristics 
are constructed in the same manner as those based on 
morphological characteristics, except the shared features 
are DNA sequences or the amino acid sequences of 
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proteins. Descendant groups have nucleic acid or amino 
acid sequences that are derived from ancient sequences 
possessed by their common ancestors (i.e., homology). 
As a consequence of DNA sequence homology, the most 
closely related molecular sequences are those that have 
the smallest number of differences between them, and 
they are carried by the most closely related species. 
Figure 1.15 examines the DNA sequences containing 
the first 15 nucleotides of the B-globin gene from seven 
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first 15 nucleotides of 
B-globin gene from 
seven species are listed 
(top to bottom) in order 
of the number of 
differences between 
each sequence. 


NN OW O 


Identical and very closely 
related sequences form a 
clade. 


Clade 4c 


a GTGTGCTGGCCCACA 
b GTGTGCTGGCCCACA 
d Dt cT eiT cee 
[e| E GA 
f TT c TENT ic cA ENA 
g| [MTG TENT icc caa 
Sequence 1 5 10 15 
a GTGTGCTGGCCCACA 
b GTGTGCTGGCCCACA 


Sequence 1 


5 10 15 


Ancestral sequence for a-c 
GTGTGCTGGCCCACA 


© Sequence d, the next 
closest, differs at the 
amino acid positions 1, 6, 
and 10. At position 11, d is 
the same as a and b; this 
means C is the ancestral 
nucleotide at position 11. 
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known phylogeny of 
vertebrates. 
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The ancestral sequence for 
species a-c can be inferred by 
comparing sequences a-c with 
that of an outgroup, species d. 
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Figure 1.15 Construction of a phylogenetic tree based on molecular characters, using the 


principle of homology. 


GENETIC ANALYSIS 


PROBLEM Evolutionary biologists have searched the genomes of pigs, 
whales, and cows to identify the presence or absence of six genes, labeled A to F 


BREAK IT DOWN: Correlation 
of the presence or absence of certain 
genes is due to shared ancestry 

and the number of similarities 

and differences between related 
organisms (p. 19) 


in the table at right. A gene is marked with a plus symbol (+) 
if it is found in a genome, or by a minus symbol (-) if it is not 
found. Use the information in the table to construct the most 
likely phylogenetic tree relating cow, whale, and pig. 


Organism Gene 
A B € D E F 
Pig + 
Whale H 
Cow H 


Solution Strategies Solution Steps 


Evaluate 

1. Identify the topic of this problem and 
the kind of information the answer 
should contain. 

2. Identify the critical information given 
in the problem. 


Deduce 


3. Identify genes shared by all three 
groups, genes shared by two of the 
groups, and genes unique to one 


1. This problem uses genetic characteristics in order to construct a phylogenetic 
tree depicting the relationships between three mammals. 


2. The presence or absence of each of six genes is given for each type of mammal. 


3. Of the six genes tested, gene A is found in all three organisms. Genes B and C 
are shared by whale and cow genomes but are not detected in the pig genome. 
Gene D is unique to pigs, E is unique to whales, and F is unique to cows. 


group. 
Solve TIP: Genes shared by organisms are likely to 
have been present in their common ancestor. 

4. Assign shared genes to phylogenetic 4. Gene A is assigned to the base of the phylogenetic tree, which ascends from 


branches that in the completed tree 
will be shared by the corresponding 
organisms. 


the common ancestor of the three organisms. Genes B and C are assigned to 
a branch shared by whale and cow. Genes D, E, and F are unique to separate 
groups and therefore are placed on separate branches. 


=== Whale 


Cow 


‘Pig 


5. Assign genes unique to each genome 5. The complete phylogenetic tree containing all genes is shown below. 


to branches that are not shared by 
other organisms. 


For more practice, see Problem 18. 


Visit the Study Area to access study tools. 


Whale 
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species (a to g). In the figure, the sequences have been 
aligned vertically, and the number of differences between 
the top sequence and each of the other sequences is noted 
in the first step of the figure. 

A common method of constructing a phylogenetic 
tree begins with pairwise comparisons of genes or nucle- 
otide sequences, grouping the most similar sequences or 
genes closest together (on the assumption that they are 
the most closely related) and subsequently bringing in 
the more distantly related sequences to add to the tree. 
Analysis in this example begins with sequences a and b, 
since they are identical, and then successively attaches 
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more distantly related sequences to the tree. Sequence 
information from c, which differs from a and b at one 
nucleotide, is appended next, followed by the other se- 
quences. A completed phylogenetic tree constructed by 
following these steps recapitulates the known phylogeny 
of vertebrates. 

Genetic Analysis 1.3 guides you in constructing a 
simple phylogenetic tree. 

The availability of DNA sequence data and genomic 
data has revolutionized how we construct and view phy- 
logenies. Some groups that were traditionally grouped to- 
gether, such as mammals, birds, and amphibians, do prove, 


from DNA sequence and genomic data, to form mono- 
phyletic groups. However, analyses have indicated that 
reptiles and fish do not form monophyletic groups and 
are, instead, paraphyletic. For example, crocodiles are now 
known to be more closely related to birds than to other 
reptiles. Similarly, morphological and molecular analyses 
of dinosaurs (recall it is sometimes possible to obtain some 
molecular information from extinct species) suggest they 
are the sister group of birds, implying that extant birds are 
a kind of modern-day descendant of dinosaurs. 


GASES UD, 


The Modern Human Family 


Modern humans and their early ancestors—an evolutionary 
group known collectively as hominins—evolved in Africa 
and moved out of Africa to Europe, Asia, and beyond in an 
undetermined number of successive migrations that be- 
gan nearly 2 million years ago. The original migrants were 
most likely the common ancestors of Homo erectus and 
other hominins. The most recent migrants, migrating out of 
Africa about 80,000 to 100,000 years ago, were ourselves— 
anatomically modern humans who constitute all of the 
world’s populations today. The story of how the modern 
human genome came to be in its present state is the subject 
of deeply interesting and rapidly changing research in evo- 
lutionary anthropology that derives much of its information 
for analysis from the sequencing of the genomes of long- 
extinct ancestors of modern humans. 


HOMININ EVOLUTION MODELS Prior to the late 1990s, 
only fossil evidence was available to model hominin 
evolution. Two principal hypotheses, the Multiregional 
(MRE) hypothesis and the Recent African Origin (RAO) 
hypothesis, emerged to explain the evolution of modern 
humans from our fossilized ancestors. The models agree 
that the genus Homo evolved in Africa and that multiple 
waves of early hominins had migrated out of Africa to 
populate Europe and Asia. The MRE hypothesis proposes 
that local development of modern humans occurred in 
several locales at about the same time. Under this model, 
all humans share a deep, common origin, but humans 
have been in many global locations for a long time and 
they have diversified locally to produce the populations 
we observe today. In contrast, the RAO hypothesis pro- 
poses that anatomically modern humans migrated out 
of Africa in a single wave about 80,000 to 100,000 years 
ago, supplanting the descendants of earlier hominin mi- 
grations they encountered and establishing modern-day 
human populations. 

Since the late 1990s, increasingly more efficient meth- 
ods have been developed to isolate and sequence DNA 
derived from fossilized bones. First demonstrated on 
bones from Neandertals in 1997, these methods have now 
produced extensive “archaic” genomic DNA sequences on 
multiple hominins that are now extinct. These data have of- 
fered general support for the RAO hypothesis, but they also 
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In addition to sequence changes that alter expressed 
genes, molecular evolution also occurs to regulatory 
sequences. These sequences are essential for gene tran- 
scription and usually operate by binding proteins that 
activate or repress transcription or by blocking the bind- 
ing of transcriptionally active proteins. Numerous evo- 
lutionary analyses and genome sequence comparisons 
have identified the important role of such evolution in 
the diversification of organisms. 


provided evidence that encounters with archaic hominins 
took place and occurred with different consequences for the 
modern human genome in locales. 


ARCHAIC GENOME SEQUENCES Genomics has under- 
gone amazingly rapid development of methods and appli- 
cations in recent years, and genome experts such as Svante 
Paabo have used new methods to decipher the genomes 
of extinct, so-called “archaic” hominins. The archaic ge- 
nomes are derived from DNA isolated from bone fragments 
that are 30,000 or more years old. Using highly specialized 
techniques, Paabo and his colleagues have assembled ge- 
nomic sequence data on two archaic hominins that rival the 
genome data for modern humans in depth and accuracy of 
genome coverage. One archaic genome is from Neander- 
tals, the hominin that was widely dispersed in Europe and 
Asia from 400,000 years ago or more until about 30,000 
years ago. The second archaic genome is from Denisovans, a 
more recently identified hominin named for Denisova cave 
in Siberia where its bones were first discovered. Denisovans 
were closely related to and contemporaneous with Nean- 
dertals. Paabo’s group has sequenced both nuclear DNA 
(the DNA from chromosomes contained in the nucleus) and 
mitochondrial DNA (the DNA contained in mitochondria 
that populated the cytoplasm of cells) of Neandertals and 
Denisovans to compare with the modern human genome. 


THE MODERN HUMAN GENOME The genomic informa- 
tion analyzed to date tells us that once modern humans mi- 
grated out of Africa, they met and mated with Neandertals 
and with Denisovans in Europe and Asia. The nuclear genom- 
ic data indicated that 2% to 4% of genomic DNA of humans 
living outside Africa is of Neandertal origin. The data also 
reveal that Denisovan DNA comprises about 4% of the ge- 
nomes of Australian aboriginals and those descendants from 
Papua New Guinea and other Pacific Islands. Figure 1.16 
depicts the current view of hominin migrations. 


THE GENOMIC STORY OF HOMININS While there is much 
more to learn about the evolutionary history of hominins, 
some basic elements are in place. Homo erectus, modern hu- 
mans, Neandertals, Denisovans, and one or more unknown 
lineages all share common African ancestry. Homo erectus 
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Multiregional model (MRE): Modern humans emerged 
gradually and simultaneously from earlier Homo erectus 
migrations on different continents. 


Recent African Origin model (RAO): Modern humans 
emerged from a small African population that migrated 
out of Africa, displacing earlier Homo erectus migrations. 


Figure 1.16 Human migration and evolution. MRE and RAO models of hominin migration. Genomic 
evidence indicates multiple migrations with replacement of archaic hominins by modern humans ac- 
companied by interbreeding. 


migrated out of Africa nearly 2 million years ago and left de- 
scendants in Europe and Asia that were the common ancestor 
of Neandertals and Denisovans. Neandertals and Denisovans 
subsequently diversified but may have maintained a very low 
level of interbreeding. Once modern humans migrated out of 
Africa they quickly encountered and mated with Neandertals 
and with Denisovans. Both archaic groups were eventually 
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1.1 Modern Genetics Is in Its Second Century 


Genetic principles first outlined by Gregor Mendel in 1865 
were “rediscovered” in 1900 and so made modern genetics a 
20th-century scientific discipline. 

Study of the transmission of morphological variation during 
the first half of the 20th century established transmission 
genetics as a central focus of genetic analysis. 


eliminated by modern humans, but they left genetic evidence 
of their interbreeding in the modern human genome in the 
form of DNA sequences and specific genes. 

The exploration of the evolution and origins of the mod- 
ern human genome is a rapidly changing new arena of inves- 
tigation. We explore this topic further in Chapter 22, but stay 
tuned—there is surely much more to come soon. 


For activities, animations, and review quizzes, go to the Study Area. 


! The analysis of DNA, RNA, and protein beginning in the 
second half of the 20th century established genetics as a 
molecular discipline. 


f Life on Earth has three domains—Bacteria, Archaea, and 


Eukarya—that share a common evolutionary history. 


1.2 The Structure of DNA Suggests a Mechanism 
for Replication 


Deoxyribonucleic acid (DNA) is the genetic material. DNA 
is a double helix containing two strands of nucleotides that 
are composed of a five-carbon deoxyribose sugar, a phos- 
phate group, and one of four nucleotide bases: adenine (A), 
thymine (T), cytosine (C), or guanine (G). 

Nucleotides in a DNA strand are joined by covalent phos- 
phodiester bonds between the 5’ phosphate of one nucleotide 
and the 3’ OH of the adjoining nucleotide. 

DNA strands are joined by hydrogen bonds that form between 
complementary base pairs. A pairs with T and C pairs with G. 
Strands of the DNA duplex are antiparallel; one strand is 
oriented 5’ — 3’, and the complementary strand is oriented 
3’ 5'. 

DNA replicates by a semiconservative process that produces 
exact copies of the original DNA double helix. 

DNA polymerase uses one strand of DNA as a template to 
synthesize a complementary daughter strand one nucleotide 
at a time in the 5’-to-3' direction. 


1.3 DNA Transcription and Messenger RNA 
Translation Express Genes 


The central dogma of biology (DNA — RNA —> protein) 
identifies DNA as an information repository and describes 
how DNA dictates protein structure through a messenger 
RNA intermediary that in turn directs polypeptide synthesis. 
Transcription is the process that synthesizes single-stranded 
RNA from a template DNA strand. 


RNA transcripts have the same 5’ — 3’ polarity and 
sequence as the coding strand of DNA; they differ only 
in the presence of U rather than T. 


Certain DNA sequences, most commonly promoters, bind 
RNA polymerase and other transcriptional proteins. 
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Translation is the process that uses messenger RNA 
(mRNA) sequences to synthesize proteins. 

Messenger RNA codons base-pair with tRNA anticodons at 
the ribosome. 

Each tRNA carries a specific amino acid that is added to the 
growing polypeptide chain. 

The genetic code contains 61 codons that specify amino 
acids and 3 that are stop codons. 

Genomics, proteomics, transcriptomics, and metabolomics 
are new investigative strategies that can help decipher 
complex problems of systems biology. 


1.4 Evolution Has a Molecular Basis 


Four processes—natural selection, migration, mutation, 
and genetic drift—drive the evolution of populations and 
species. 

The evolution of adaptive morphological characters occurs 
through natural selection pressures exerted on species by 
their environments. Nonadaptive characters that are neutral 
with respect to natural selection evolve by other evolution- 
ary processes. 

The modern synthesis of evolution is the name applied to 
the union of transmission genetics, molecular genetics, 
Darwinian evolution, and modern evolutionary genetics. 
Phylogenetic trees describe the evolutionary relation- 
ships among modern species and trace their descent from 
common ancestors to identify the most likely pattern of 
evolution. 


Shared derived characteristics are molecular or morphologi- 
cal attributes that evolve in descendant species from ancient 
characters found in a common ancestor. 

Molecular phylogenies trace the evolution of nucleic acid 

or protein sequences from common ancestors to modern 
species. 
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PROBLEMS 


Chapter Concepts 


1. 


Genetics affects many aspects of our lives. Identify three 
ways genetics affects your life or the life of a family mem- 
ber or friend. The effects can be regularly encountered or 
can be one time only or occasional. 


How do you think the determination that DNA is the 
hereditary material affected the direction of biological 
research? 


A commentator once described genetics as “the queen of 
the biological sciences.” The statement was meant to imply 
that genetics is of overarching importance in the biological 
sciences. Do you agree with this statement? In what ways 
do you think the statement is accurate? 


All life shares DNA as the hereditary material. From an 
evolutionary perspective, why do you think this is the case? 


Define the terms allele, chromosome, and gene and 
explain how they relate to one another. Develop an 
analogy between these terms and the process of using a 
street map to locate a new apartment to live in next 
year (ie., consider which term is analogous to a street, 
which to a type of building, and which to an apartment 
floor plan). 


Define the terms genotype and phenotype, and relate them 
to one another. 


Define natural selection, and describe how natural 
selection operates as a mechanism of evolutionary 
change. 


Application and Integration 


13. 


14. 


If thymine makes up 21% of the DNA nucleotides in the 
genome of a plant species, what are the percentages of the 
other nucleotides in the genome? 


What reactive chemical groups are found at the 5’ and 
3’ carbons of nucleotides? What is the name of the bond 
formed when nucleotides are joined in a single strand? Is 
this bond covalent or noncovalent? 


( MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 


For answers to selected even-numbered problems, see Appendix: Answers. 


8. 


10. 


11. 


12. 


Describe the modern synthesis of evolution, and explain 
how it connects Darwinian evolution to molecular evolution. 


What are the four processes of evolution? Briefly describe 
each process. 


Define each of the following terms: 


transcription 

allele 

central dogma of biology 

translation 

DNA replication 

gene 

chromosome 

. antiparallel 
phenotype 
complementary base pair 
nucleic acid strand polarity 
genotype 

. natural selection 

. mutation 

modern synthesis of evolution 


OBR rAT Orem me angee 


Compare and contrast the genome, the proteome, and the 
transcriptome of an organism. 


With respect to transcription describe the relationship 
and sequence correspondence of the RNA transcript and 
the DNA template strand. Describe the relationship and 
sequence correspondence of the mRNA transcript to the 
DNA coding strand. 


For answers to selected even-numbered problems, see Appendix: Answers. 


15. 


16. 


Identify two differences in chemical composition that 
distinguish DNA from RNA. 


What is the central dogma of biology? Identify and de- 
scribe the molecular processes that accomplish the flow of 
genetic information described in the central dogma. 


17. 


18. 


19. 


20. 


21. 


A portion of a polypeptide contains the amino acids 
Trp-Lys-Met-Ala-Val. Write the possible mRNA and 
template-strand DNA sequences. (Hint: Use A/G and 

T/C to indicate that either adenine/guanine or thymine/ 
cytosine could occur in a particular position, and use N to 
indicate that any DNA nucleotide could appear.) 


The following segment of DNA is the template strand tran- 
scribed into mRNA: 


5 ’-..GACATGGAA...-3 ” 


a. What is the sequence of mRNA created from this 
sequence? 

b. What is the amino acid sequence produced by 
translation? 


Consider the following segment of DNA: 


5 ’-..ATGCCAGTCACTGACTTG...- 3 ’ 
3 ’-.. TACGGTCAGTGACTGAAC...-5 ’ 


24. 


a. How many phosphodiester bonds are required to form 
this segment of double-stranded DNA? 
b. How many hydrogen bonds are present in this DNA 


segment? 25. 


c. Ifthe lower strand of DNA serves as the template tran- 
scribed into mRNA, how many peptide bonds are pres- 
ent in the polypeptide fragment into which the mRNA 
is translated? 


Examine Figure 1.14 and answer the following questions. 


a. How many clades are shown in the figure? 

b. What characteristic is shared by all clades in the figure? 

c. What characteristics are shared by the mammalian 
clade and the human clade? What characteristics distin- 
guish these two clades? 


Fill in the missing nucleotides so there are three per block 
and the missing amino acid abbreviations in the graphic 
shown 


DNA 

Coding 5' 7/7 IGGCiGAT nt j >” 
Template 3'0 a a _{& 
mRNA codon 


5' M aa ee aa | PA AA 3° 
tRNA anticodon 


3 ee A 5 


Amino acid 


3-letter EDEDED) 
etter aaa | 


22. 


23. 


Problems 25 


Four nucleic-acid samples are analyzed to determine 
the percentages of the nucleotides they contain. Survey 
the data in the table below, determine which samples 
are DNA and which are RNA, and specify whether each 
sample is double-stranded or single-stranded. Justify 
each answer. 


A G T U € 
Sample1 22% 28% 22% 0 28% 
Sample2 30% 30% 0 20% 20% 
“Sample3 18% 32% 0 18% 32% 
Sample4 29% 29% 21% 0 21% 


Are seed-eating finches among Darwin’s finches mono- 
phyletic or paraphyletic? What about cactus flower—eating 
finches? 


If one is constructing a phylogeny of reptiles using DNA 
sequence data, which taxon (birds, mammals, amphibians, 
or fish) might be suitable to use as an outgroup? 


Using the following amino acid sequences obtained from 
different species of apes, construct a phylogenetic tree of 
the apes. 


Pongo pygmaeus GGPHYRLIAVED 
Pongo abelii GGPHYRLIAVED 
Pan paniscus GAPHFRLLAVEE 
Pan troglodytes GAPHFRLLAVEE 
Gorilla gorilla GAPHFRLIAVEE 
Gorilla beringei GAPHFRLIAVEE 
Homo sapiens GAPHFNLLAVEE 
Hylobates lar GGPHYRLISVED 
Hoolock hoolock GGPHYRLISVDD 
Common ancestor GGPHYRLISVDD 
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2.1 Gregor Mendel Discovered 
the Basic Principles of Genetic 
Transmission 
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the Predictions of Mendel’s 
Hereditary Principles 


This statue of Gregor Mendel stands in the garden of the St. Thomas 
monastery in Brno, Czech Republic just a few feet from where his 
greenhouse once stood. You can take a virtual tour of the museum 
and see additional interactive features at www.mendel-museum.com. 


y Gregor Mendel identified and described two 
fundamental laws of hereditary transmission, he ush- 
ered in a new era of understanding in biology. The terms 
Mendelian genetics and Mendelism were coined to recog- 
nize this contribution, and they are used as synonyms for 
transmission genetics, the field that describes and investi- 
gates the patterns of transmission of genes and traits from 
parents to offspring. Like his contemporary Charles Darwin, 
who elegantly described the process of evolution by natural 

selection, Mendel articulated a new way to view the world. 
Mendel was by no stretch of the imagination the first 
person to examine the transmission of hereditary traits in plants. 
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Many amateur botanists of the 18th and early 19th 
centuries conducted what were then called studies of 
“plant hybridization” on many species, including the 
edible pea plant (Pisum sativum) that was the subject 
of Mendel’s experiments. Others before him had even 
carried out crosses similar to Mendel’s, some made 
observations like those on which Mendel based his 
two principles of heredity, and some even came close 
to articulating a description of the hereditary princi- 
ples Mendel described. But no one described heredi- 
tary transmission as precisely as Mendel did. Mendel 
succeeded because of his superior experimental de- 
sign and his quantification of results. His approach al- 
lowed him to formulate and test genetic hypotheses 
with a level of rigor that no one had achieved before 
him or would achieve for another 35 years. 

In this chapter, we examine how Mendel used 
experimental designs and results to identify two 
pivotal principles of hereditary transmission. We 
see (1) how Mendel’s unprecedented experimental 
designs enabled him to detect genetic phenomena 
that escaped identification by his predecessors and 
(2) how the transmission of traits can be predicted 
using random probability theory. The chapter con- 
cludes with a description of the molecular genet- 
ics of four of the genes controlling traits described 
by Mendel. To date, the other three genes remain 
unidentified, although their effects on phenotypic 
variation are well known. We begin, however, with a 
short biography of Gregor Mendel that reveals how 
his educational experiences profoundly influenced 
his approach to scientific exploration. 


2.1 Gregor Mendel Discovered the 
Basic Principles of Genetic Transmission 


Born in 1822 to a farming family of modest means in 
the village of Hyncice that is now part of the Czech 
Republic, Johann (later known by his clerical name, 
Gregor) Mendel completed the equivalent of high school 
at age 18 with a certificate attesting to exceptional 
academic abilities. He began his higher education at the 
Olomouc Philosophical Institute in 1840, but these stud- 
ies took a severe toll on his mental and physical health, 
and he gave them up after the first year. In 1843, after 
attempting unsuccessfully to restart his education at 


Olomouc, he decided to pursue higher learning by enter- 
ing the priesthood instead. Based on its strong reputa- 
tion in teacher training and a recommendation from 
a former teacher at Olomouc, he selected St. Thomas 
monastery in the Czech city of Brno. Mendel’s duties 
at St. Thomas included temporary teaching of natural 
science at a middle school in Brno. His keen interest in 
teaching science and his desire to become a permanent 
teacher led monastery administrators to send Mendel to 
the University of Vienna in 1851 to study natural science 
as preparation for a teaching examination. 

In Vienna, Mendel studied plant physiology and plant 
biology with Professor Franz Unger and physics with 
Professor Christian Doppler as well as Doppler’s succes- 
sor, Professor Andreas von Ettinghausen. From Professor 
Unger, Mendel learned to think critically about prevail- 
ing theories of plant reproduction and hybridization. 
Doppler, an experimental physicist famous for describing 
the Doppler effect, espoused a “particulate” view of physics 
and taught Mendel how to study individual characteristics 
separately in experiments. Professor Ettinghausen taught 
Mendel the mathematics of combinatorial analysis. Mendel 
would apply these lessons to his later research. In 1853, 
Mendel returned to Brno, where he took and passed the 
written portion of the permanent teachers’ examination 
but apparently never completed the oral portion, remain- 
ing a “temporary” teacher at the school in Brno until he 
became abbot of the monastery in 1868. 

In the summer of 1856, after a 3-year period during 
which he pondered how he might pursue his interest in 
natural science, Mendel began his work on trait hered- 
ity in the edible pea plant Pisum sativum. Mendel began 
his studies by gathering 34 different varieties of peas 
collected from local suppliers. Over the next 2 years, he 
tested each variety for its ability to uniformly reproduce 
identical characteristics from one generation to the next. 
Ultimately, he settled on 14 strains of Pisum representing 
seven individual traits, each of which had two easily distin- 
guished forms of expression in a seed or plant (Figure 2.1). 
Mendel worked with these 14 strains for the next 5 years, 
concluding his experiments in 1863. 

On February 8 and March 8, 1865, Mendel dis- 
cussed his work on peas at two meetings of the Natural 
History Society of Brunn (Brno). The society published 
his report in its Proceedings the following year, 1866. 
After publication of his work, Mendel corresponded with 
several prominent botanists in Europe, most notably Karl 
Naegeli. Mendel’s letters to Naegeli have scientific sig- 
nificance because they clearly lay out his experiments, 
his results, and his conclusions. Unfortunately, neither 
Naegeli nor any of his contemporaries seemed to grasp 
the importance of Mendel’s work. 

After becoming abbot of the monastery in 1868, 
Mendel gave up his work in genetics but continued to 
pursue his interests in bee keeping and meteorology. As 
abbot, he became involved in business activities such 
as holding a seat on the board of directors of a local 
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Figure 2.1 The seven dichotomous traits 


of Pisum sativum studied by Mendel. Each Seed Pod Flower Plant 
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bank and running a brewery that generated income for 
St. Thomas. He faithfully served the monastery until his 
death in 1884. Mendel died in scientific obscurity, never 
having had the importance of his experiments understood 
or appreciated. Sixteen years after his death, in 1900, 
biologists would replicate and rediscover his experiments 
and launch a revolution in biology. 


Mendel’s Modern Experimental Approach 


Mendel successfully identified principles of hereditary 
transmission that eluded investigators who preceded him 
and continued to elude investigators for many years after 
his death. Was Mendel more insightful? Did he make 
fortuitous choices by selecting Pisum sativum as his ex- 
perimental organism and in selecting his seven char- 
acteristics? Did he have a superior approach to genetic 
experimentation and analysis? The answer to each of these 
questions is yes. 

Mendel’s superior insight came principally from his 
familiarity with quantitative thinking and his understanding 
of the particulate nature of matter, learned through the study 
of physics with Doppler. Central to Mendel’s experimental 
success was counting the number of progeny with specific 
phenotypes. This logical and now routine component of data 
gathering was the key to Mendel’s ability to formulate the 
hypotheses that explained his results. Under Doppler and 
Ettinghausen, Mendel had learned to isolate individual prop- 
erties of matter he wished to study and to think in quantita- 
tive terms about combinations of outcomes. 

Mendel made a fortuitous choice in selecting the pea 
plant as his experimental organism. Peas were commonly 
used for hybridization studies in Mendel’s time, so a large 
number of strains displaying different phenotypic character- 
istics were available. The pea plant is hardy and was easy for 
a skilled botanist like Mendel to manipulate and crossbreed. 

In choosing to study individual traits of the pea plant, 
Mendel designed his experiments to test the blending 
theory of heredity that was the predominant hereditary 
theory at the time. The blending theory viewed the traits 
of progeny as a mixture of the characteristics possessed 


by the two parental forms. Under this theory, progeny 
were believed to display characteristics that were approx- 
imately intermediate between those of the parents. For 
example, the blending theory would predict that crossing 
a black cat and a white cat would produce gray kittens, 
and that the original black or white colors would never 
reappear if the gray kittens were bred to one another. 
Mendel reasoned that if the blending theory were true, 
he would see evidence of it in each trait. If no blending 
were seen in individual traits, the blending theory would 
be disproved. 

As crucial as his quantitative approach and choice 
of Pisum were to his ultimate success, Mendel’s radi- 
cally new experimental design was his most important 
innovation. Mendel was ahead of his time in that his 
scientific experiments were hypothesis driven. In other 
words, following an initial observation, he devised a hy- 
pothesis to explain the observation and then carried out 
an independent experiment to test the hypothesis. It is for 
experimental innovations and his analysis that Mendelian 
genetics is the term used to identify this field of genetics. 
An experimenter employing this approach, known today 
as the scientific method, will follow these steps: 


1. Make initial observations about a phenomenon or 
process. 


2. Formulate a testable hypothesis to explain 
observations. 

3. Design a controlled experiment to test the hypothesis. 
Collect data from the controlled experiment. 


5. Interpret experimental results, comparing the 
observed results to those expected under assump- 
tions of the hypothesis. 


6. Draw reasonable conclusions, reformulating or retest- 
ing the hypothesis if necessary. 


Mendel followed these steps to collect data on individ- 
ual traits of the pea plant, formulate hypotheses to explain 
his phenotypic observations, and conduct independent ex- 
periments to test his predictions. 
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Five Critical Experimental Innovations 


Five features of Mendel’s breeding experiments distin- 
guish them from those of his contemporaries and were 
critical to his success: (1) controlled crosses between 
plants; (2) use of pure-breeding strains to begin the exper- 
imental controlled crosses; (3) selection of dichotomous 
traits; (4) quantification of results; and (5) use of replicate, 
reciprocal, and test crosses. 


Controlled Crosses between Plants In nature, pea plant 
flowers contain both a pollen-producing anther and an 
egg-containing ovule and usually self-fertilize (Figure 2.2). 
Self-fertilization occurs when sperm-containing pollen 
from the anther fertilizes an egg within the ovule. Fertilized 
ovules develop in the ovary, which matures into fruit (seed 
pod) as seeds (peas) develop inside. A mature seed pod 
usually contains five to seven peas, each of which results 
from a different fertilization event. In genetic experiments, 
peas can be collected and scored for their phenotypes or 
can be planted to produce pea plants that are scored for 
their traits. 

Pea plants are also capable of cross-pollination, if 
pollen from one plant is used to fertilize the ovules of 
another. In nature, plants are cross-pollinated by insects, 
birds, mammals, and wind. Mendel used his familiar- 
ity with plants to carry out artificial cross-fertilization 
(Figure 2.3). First, he emasculated developing pea flow- 
ers by cutting off the nascent anthers. This modification 
made the plants incapable of self-pollination, but the 
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Figure 2.2 Life cycle of Pisum sativum. Seeds (peas) are 
planted and germinate, growing into mature flowering plants. 
Eggs in the flower ovule are fertilized by pollen produced from 
anthers. Immature seeds arise from individual fertilized eggs in 
the pod that forms as seeds develop. After seeds mature, they 
are dispersed to renew the cycle. 
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Figure 2.3 Artificial cross-fertilization 
of pea plants. 
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ovules could still be fertilized by cross-fertilization with 
pollen from another plant. Mendel carried out artificial 
cross-fertilization by using a small paintbrush to lift ma- 
ture pollen from a non-emasculated flower and brush 
it onto an emasculated flower. With this manipulation 
Mendel restricted reproduction to those plants he identi- 
fied beforehand as likely to yield informative results, thus 
performing what is now known as a controlled genetic 
cross between selected organisms. 


Pure-Breeding Strains to Begin Experimental Crosses 
During the 2 years before beginning his hereditary 
experiments, Mendel performed numerous controlled 
genetic crosses to obtain strains that consistently 
produced a single phenotype without variation. Strains of 
this kind that consistently produce the same phenotype 
are called pure-breeding strains, also known as true- 
breeding strains. The self-fertilization of a pure-breeding 
purple-flowered plant will yield only purple flowers 
among progeny plants. Two plants from a pure-breeding 
line can be crossed to one another and will produce 
progeny with the same phenotype. 

Mendel’s work generated 14 pure-breeding strains 
for his 7 traits, and he used two different pure-breeding 
strains to begin each of his hereditary experiments. For 
example, Mendel crossed pure-breeding purple-flowered 
plants with pure-breeding white-flowered plants. By ar- 
tificial cross-fertilization of these parental generation 
(P generation) plants, Mendel produced seeds that were 
grown into the first filial generation (F; generation) of 
plants (Figure 2.4). The F; plants were then used as the 
sources of pollen and egg to produce the seeds that were 
grown into the second filial generation (F, generation). 
The third filial generation (F; generation) was pro- 
duced by crossing plants from the F, generation, and so 
on for as many generations as needed. 


Selection of Single Traits with Dichotomous Pheno- 
types Each of the seven traits that Mendel chose is found 
in just two dichotomous forms. The two phenotypes are 
readily distinguished from one another, so there can be no 
ambiguity of assignment, and there are no intermediate 
phenotypes. For example, one trait was seed color; every 
seed was either yellow or green. 

The alternative forms of the seven traits Mendel 
studied are illustrated in Figure 2.1. The 14 pure-breeding 
strains were bred for (1) seed color (yellow or green), (2) 
seed shape (round or wrinkled), (3) pod color (green or 
yellow), (4) pod shape (inflated or constricted), (5) flower 
color (purple or white), (6) flower position (axial or termi- 
nal), and (7) plant height (tall or short). 

It is interesting to note that Mendel initially had 
selected an eighth trait producing either gray or white ex- 
terior seed coats. Early in his analysis, however, he found 
that plants with purple flowers always had gray seed coats 
and that those with white flowers always had white seed 
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Figure 2.4 Production of three generations of pea plants. 
Plants of the P generation are artificially cross-fertilized to produce 
the F; generation. Self-fertilization or crossing of F,-generation 
plants produces the Fz generation. F> plants either self-fertilize 
or are crossed to one another to produce the F3 generation. 


coats. He correctly speculated that flower color and seed- 
coat color were determined by the same genetic mecha- 
nism. The pigment anthocyanin is responsible for plants 
that have purple flower color and gray seed coats, but a 
mutation eliminates anthocyanin production in plants 
with white flowers and white seed coats. 


Quantification of Results Each time Mendel made a 
controlled cross, he carefully counted the number of 
progeny plants of each phenotype. This seemingly simple 
act—now standard in scientific data gathering—was 
revolutionary in Mendel’s day. By obtaining large numbers 
of offspring from each cross, as was possible when using 
peas, and by expressing his results numerically, Mendel 
could more easily analyze them for revealing patterns 
such as the occurrence of consistent ratios between 
phenotypes. These ratios were critically important to 
Mendel’s discovery of the rules by which he could predict 
transmission of alleles during reproduction, and they are 
the foundation of Mendel’s two laws of heredity. 


Replicate-, Reciprocal-, and Test-Cross Analysis The 
final features that distinguished Mendel’s experiments 
are his use of three genetic-cross strategies that have 
become tried-and-true approaches to genetic analysis. 
Rather than simply counting the results of a single cross, 
for example, Mendel made many replicate crosses, 
producing hundreds of F; plants and several thousand F» 
plants by repeating the same cross several times. 


Mendel also performed reciprocal crosses, in which 
the same genotypes are crossed but the sexes of the do- 
nating parents are switched. The plant providing the egg 
in the first cross is used as a source of pollen in the recip- 
rocal cross. An example of a reciprocal cross is shown in 
Figure 2.5a. First, pollen from a strain producing yellow 
peas (GG) is used to fertilize the egg of plants from a 
strain producing green peas (gg). Then a reciprocal cross 
is performed using pollen from the green-pea—producing 
plant to fertilize eggs of the yellow-pea—producing plant. 
Note that both these reciprocal crosses produce F, with 
yellow peas. We discuss the importance of this result in 
the following section. 

Finally, Mendel performed test crosses (Figure 2.5b). 
Here, R and r represent alleles of the rugose gene, mean- 
ing “full of wrinkles.” We examine the results and sig- 
nificance of this kind of controlled genetic cross below. 
In Figure 2.5b, we introduce a bit of genotype shorthand 
with the designation R_ to identify the round seeded 
plant in the test cross that is either RR or Rr. Spoken “are 
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Figure 2.5 Reciprocal crosses and test cross. (a) Two 
reciprocal crosses between different pure-breeding yellow (GG) 
and green (gg) parents produce F, plants with yellow seeds 
(Gg). (b) A test cross is made between an F, with the dominant 
phenotype that is possibly heterozygous (as indicated by R-) 
and a pure-breeding (rr) plant with the recessive phenotype. 
See Section 2.2 for definitions of these terms. 
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blank,” the designation means either that the second allele 
is unknown (that’s the case here) or that it is not relevant 
(as shown in Figure 2.6). 


2.2 Monohybrid Crosses Reveal 
the Segregation of Alleles 


In this section we illustrate the results and interpretation 
of Mendel’s crosses by studying the transmission of two 
of Mendel’s traits, pea color (yellow or green) and, in 
separate crosses, the transmission of pea shape (round 
or wrinkled). The results and interpretations we describe 
apply equally well to the five other traits Mendel exam- 
ined. The uniformity of the experimental results and 
interpretations are due to Mendel’s decision to conduct 
experiments on each trait in the same way. He began 
hereditary experiments on each trait by artificial cross- 
fertilization of pure-breeding parental plants to produce 
an F, generation, and he then self-fertilized or inter- 
crossed F} plants to produce the F, generation. 


Identifying Dominant and Recessive Traits 


By crossing pure-breeding yellow-pea—producing plants 
and pure-breeding green-pea producers in replicate and 
reciprocal crosses, Mendel consistently found that all of 
the F; plants produced yellow peas and none produced 
green peas (Figure 2.6). Mendel identified yellow as the 
dominant phenotype on the basis of its presence in the 
F,, and he identified green as the recessive phenotype 
since it is not seen among F, progeny. Mendel next 
crossed F; yellow plants to produce the F and observed 
reemergence of the recessive green phenotype. Among 
the Fy, Mendel found that approximately three-fourths 
(75%) of the peas were yellow and the remaining one- 
fourth (25%) were green. The yellow: green ratio in the 
F, is 3:3, or roughly 3:1. Mendel correctly interpreted 
these results to indicate that F offspring with the 
dominant trait were a mixture of two genotypes—GG 
and Gg, in this case—and that plants with the recessive 
trait were homozygous recessive—gg in this instance. 
In general terms, the dominant F, can be classified as 
being G_ (“G blank”). In this context, the second allele, 
whether G or g, is not important in determining the 
phenotype; thus G_ is a kind of shorthand for indicat- 
ing that the genotype is either GG or Gg. Mendel made 
similar observations for his experiments testing inheri- 
tance of pea shape. Replicate and reciprocal crosses of 
pure-breeding round-pea—producing plants with pure- 
breeding wrinkled-pea—producing plants produced Fy 
plants bearing exclusively round peas. This result iden- 
tifies round as the dominant phenotype and wrinkled as 
the recessive phenotype. His F; cross produced F, peas 
in the ratio 75% round to 25% wrinkled—once again a 
roughly 3:1 ratio. 
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Figure 2.6 Segregation of alleles for seed color. In the 
cross between yellow-seeded and green-seeded pure-breeding 
parental plants, F4 progeny display the dominant yellow phe- 
notype. Note that the 3:1 phenotypic ratio and 1:2:1 genotypic 
ratio displayed in the F> generation result from crossing the F4. 


Table 2.1 


Tabulating results over several growing seasons 
for all seven traits, Mendel counted more than 20,000 
Fə peas or plants. Table 2.1 displays Mendel’s results 
revealing three consistent features: (1) dominance of 
one phenotype over the other in the F; generation, (2) 
reemergence of the recessive phenotype in the F gen- 
eration, and (3) a ratio of approximately 3:1 (dominant: 
recessive) among F, phenotypes. Mendel determined 
that yellow is dominant to green and round is dominant 
to wrinkled based on F; results. Green pea color and 
wrinkled pea shape reemerge in the Fp, which displays 
a consistent 3:1 ratio between the dominant and reces- 
sive phenotypes. For example, Mendel classified 8023 
Fy peas by their color and 7324 F, peas by their shape. 
Among the F, peas classified by color, he found 6022 yel- 
low seeds and 2001 green seeds, a ratio of almost exactly 
three to one. Of the F seeds classified for pea shape, 
5474 were round and 1850 were wrinkled, again a ratio 
of very nearly three to one. Data for each of the other five 
characteristics revealed the same 3:1 ratio of dominant 
to recessive in the F}. 


Evidence of Particulate Inheritance 
and Rejection of the Blending Theory 


Mendel’s F} experimental results reject the blending the- 
ory of heredity. Specifically, the observation that all F, 
progeny have the same phenotype (i.e. the dominant 
phenotype) that is indistinguishable from the pheno- 
type of one of the pure-breeding parents contradicts the 
blending theory prediction that the F,; would display a 
phenotype that is a blend of the two parental phenotypes. 


Mendel’s Observations for Seven Monohybrid Traits in the F4 and F, Generations 


Crosses between 
Pure-Breeding Parental 


Phenotypes F, Phenotype F2 Phenotypes F2 Phenotype Ratio 
Dominant Recessive 

Round Xx wrinkled seeds? All round seeds 5474 round 1850 wrinkled 2.96:1 

Yellow X green seeds All yellow seeds 6022 yellow 2001 green 3.01:1 

(interior seed color) 

Purple X white flowers? All purple flowers 705 purple 224 white 3.15:1 

(gray X white seed coat, (gray seed coat) 

or exterior seed color) 

Axial X terminal flowers All axial flowers 651 axial 207 terminal 3.14:1 

Green X yellow pods All green pods 428 green 152 yellow 2.82:1 

Inflated X constricted pods All inflated pods 882 inflated 299 constricted 2.95:1 

Tall X short plants All tall plants 787 tall 277 short 2.84:1 

TOTAL 14,949 5010 2.98:1 


“The dominant phenotype is written first and always appears as the F; phenotype. 


© A single gene controls both flower color and seed-coat color. Mendel discussed both traits but recognized they were controlled by the same gene. 


The persistence of the dominant phenotype and the re- 
emergence of the recessive phenotype in the F, also run 
counter to the predictions of the blending theory. 

Having rejected the blending theory, however, 
Mendel went on to propose a new hereditary hypothesis. 
Taking advantage of the analytical superiority of his quan- 
titative approach to data analysis, Mendel proposed that 
each trait is determined by two “particles of heredity.” 
Mendel used the German word elemente, a term meaning 
“unit or element,” to describe the two discrete units of he- 
reditary information for each trait. This idea is the basis of 
Mendel’s theory of particulate inheritance, which pro- 
poses that each plant carries two particles of heredity for 
each trait. A plant receives one unit of heredity in the egg 
and the second unit in pollen. Each parental plant passes 
one of its two particles to offspring during reproduction. 

The hereditary particles that are passed from one gen- 
eration to the next are called alleles in modern terminology. 
This term had not been invented in Mendel’s time (nor had 
the term gene, for that matter), but he correctly surmised 
that two elementen (alleles) were present for each trait in a 
plant and together determined the phenotype of the trait. 
Mendel used letters as symbols to represent the alleles for 
each trait, and he proposed a pattern of allele transmission 
from parents to offspring that explained his phenotypic ob- 
servations in the F; and the F). Mendel proposed that pure- 
breeding lines contain two identical copies of the same allele. 

Pure-breeding organisms have a homozygous geno- 
type, a term meaning that the two alleles (ie, the two 
copies of the gene) carried by an organism are identical. If 
a homozygous plant is self-fertilized or if two organisms 
pure-breeding for the same trait are crossed, the progeny 
receive identical alleles from each parent and have the same 
homozygous genotype as the parents as well as the same 
phenotype. In contrast, if a genetic cross is made between 
pure-breeding parents with different traits, each parent is 
homozygous for a different allele. The progeny receive a 
distinct allele from each parent and have a heterozygous 
genotype, a term meaning that two different alleles make 
up the genotype. Heterozygous organisms can have a domi- 
nant phenotype if they carry a copy of the dominant allele. 

Geneticists now know that inheritance of the seven 
traits Mendel described is controlled by pairs of alleles 
of seven different genes. Thus, while Mendel did not use 
the words gene or allele, he understood the concept em- 
bodied by each term. Contemporary genetics describes 
inheritance of Mendel’s traits in terms of genes and alleles 
and continues to use letters to represent alleles. Different 
notational schemes and gene-naming conventions have 
been adopted for different species. (A table describing 
gene naming, gene nomenclature, and other information 
about the genes and genomes of model genetic organisms 
is located inside the book back cover.) 

Central to understanding the inheritance of the seven 
traits Mendel studied is the concept that pure-breeding 
organisms have homozygous genotypes. In Figure 2.6, for 
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example, the pure-breeding yellow parent has the GG 
homozygous genotype, and the pure-breeding green parent 
has the gg homozygous genotype. Crosses of pure-breeding 
parents of different homozygous genotypes produce 
heterozygous (Gg) Fı progeny that all have the dominant 
yellow phenotype. According to Mendel’s hypothesis, each 
pure-breeding parent passes one allele to the F,, making it 
heterozygous. One allele, G in this case, is dominant and 
produces the dominant phenotype in all the F}. 

The heterozygous F; are then crossed with one an- 
other or are self-fertilized in a monohybrid cross, a term 
referring to a cross between two organisms that have the 
same heterozygous genotype for one gene. With a domi- 
nant and a recessive allele in the heterozygous genotype of 
plants undergoing a monohybrid cross, a 3:1 phenotypic 
ratio is predicted for the Fy. At the same time, F, organ- 
isms are predicted to have three genotypes: The two 
homozygous genotypes (the same genotypes present in the 
original pure-breeding parents) are each expected to oc- 
cur in one-fourth of the F, progeny, and the heterozygous 
genotype is predicted in the remaining one-half of the F, 
progeny. Therefore, among the F», a 1:2:1 genotypic ratio 
is predicted. The one-fourth of the F, that are homozygous 
GG plus the one-half of F} progeny that are heterozygous 
Gg are the three-fourths of the F, with the dominant 
(yellow) phenotype. The remaining one-fourth of the F, 
contain the homozygous gg genotype and have the re- 
cessive (green) phenotype. The same inheritance pattern 
occurs for all the other traits studied by Mendel. 


Segregation of Alleles 


Figure 2.6 uses letters as symbols to represent alleles and 
genotypes in parental, F;, and F organisms and intro- 
duces a simple and functional tool of genetic analysis—the 
Punnett square. The Punnett square method of diagram- 
ming the genetic content of gametes and their union to 
form offspring is named in honor of Sir Reginald Punnett, 
a famous geneticist of the early 20th century. The Punnett 
square separates the two alleles carried by each reproduc- 
ing organism, placing those from one parent along the 
vertical margin of the square and those from the other 
parent along the horizontal margin. These separated al- 
leles represent the gametes of reproducing organisms, 
the sperm (or pollen) and egg cells, each of which carries 
only one copy of each gene. The squares in the body of the 
Punnett diagram show the results expected from random 
uniting of the gametes, identifying the genotype of off- 
spring produced by each possible combination of parental 
gametes. In Figure 2.6, the gametes of the F; parents are 
placed at the margins of the Punnett square, and gamete 
union produces the F, generation in the genotype propor- 
tions shown in the body of the Punnett square. 

Mendel used the concept of particulate inheritance to 
analyze his experiments and to formulate a hypothesis to 
explain his results. Mendel’s first hypothesis is known as 
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the law of segregation, sometimes also known as Mendel’s 
first law. This hypothesis describes the particulate nature of 
inheritance, identifies the segregation (separation) of alleles 
during gamete formation, and proposes the random union 
of gametes to produce progeny in predictable proportions: 


The law of segregation The two alleles for each trait 
will separate (segregate) from one another during gamete 
formation, and each allele will have an equal prob- 
ability (5) of inclusion in a gamete. Random union of 
gametes at fertilization will unite one gamete from each 
parent to produce progeny in ratios that are determined 
by chance. 


The law of segregation applies to each of the seven 
traits Mendel examined, and each experiment produces 
similar results. We can take flower color as an example 
and use the law of segregation to explain the events shown 
in Figure 2.4, from the parental cross through the produc- 
tion of F, progeny. Gametes formed by pure-breeding 
purple (PP) parents all contain P. Similarly, gametes from 
pure-breeding white (pp) parents all contain p. The F; all 
have the dominant purple phenotype and have a hetero- 
zygous (Pp) genotype. Segregation of alleles is more easily 
visualized among gametes produced by the heterozygous 
F, plants: One-half of the gametes from those plants are 
expected to contain P and one-half to contain p. The ran- 
dom union of gametes from the heterozygous F; plants 
leads to the combinations and frequencies shown in the 
Punnett square of Figure 2.6, leading to the 1:2:1 geno- 
typic ratio and the 3:1 phenotypic ratio. 


Hypothesis Testing by Test-Cross Analysis 


Mendel proposed the law of segregation to explain the 
phenotype proportions he observed in the F; and F, gen- 
erations of his breeding experiments. Consistent with good 
scientific method, he considered the law of segregation 
to be a hypothesis that made testable predictions about 
cross progeny. Mendel’s proposal that F progeny are het- 
erozygous is critical to the proposal that the gametes that 
produce the F will have an equal chance of containing 
one or the other of the alleles. Based on his segregation 
hypothesis, Mendel expected one-half of the gametes de- 
rived from the heterozygous F, to carry the dominant allele 
and the remaining one-half to carry the recessive allele. 

To test this prediction, Mendel performed test-cross 
analysis, by mating a suspected heterozygous F; plant with 
a pure-breeding recessive plant (Figure 2.7). Based on the 
segregation hypothesis, Mendel predicted that test-cross 
progeny phenotypes would be 50% dominant and 50% 
recessive. The test cross diagrammed in Figure 2.7 is per- 
formed between a plant grown from a round F; seed 
and a pure-breeding wrinkled-seed plant. In this test, the 
wrinkled-seed plant is homozygous rr and produces only 
r-containing gametes. Therefore, if the F, plant is het- 
erozygous, it should produce R gametes and r gametes 


Pure Pure 
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RR 
P@ v So 
\ 


Cross-fertilization 


Heterozygous Pure Test cross of dominant F, 


Rr id plant to a recessive plant 


F, (=) x @ to determine if the F; is 
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will be 1:1. 


F, J r ar 
iR Rr Rr 
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re F the ratio of its gametes 


Punnett square 


In Mendel’ test-cross experiment, he 
found 193 round and 192 wrinkled 
test-cross progeny—a 1.01:1 ratio. 


Figure 2.7 Test-cross analysis of F4 plants. A test cross 
between an F, plant and one that is homozygous recessive 
produces progeny with a 1:1 ratio of the dominant to the 
recessive phenotype if the F4 plant is heterozygous. 


at a frequency of + each. Consequently, the progeny of 
the cross would be 4 Rr and 4 77, resulting in a 1:1 ratio of 
round: wrinkled. As the figure indicates, Mendel performed 
this cross and observed 193 round peas and 192 wrinkled 
peas in test-cross progeny. Mendel performed this kind of 
test-cross analysis for several of his traits and consistently 
observed a 1:1 ratio in test-cross progeny (Table 2.2). 


Table 2.2 Test-Cross Results from Mendel’s 
Experiments 

Test Cross Test-Cross Progeny Ratio 

Dominant Recessive 

Round seed 193 round (Rr) 192 wrinkled 1.01:1 

(Rr) X wrinkled (rr) 

seed (rr) 

Yellow seed 196 yellow (Gg) 189 green (gg) 1.04:1 

(Gg) X green 

seed (gg) 

Purple flower 85 purple (Pp) 81 white (pp) 1.05:1 

(Pp) X white 

flower (pp) 

Tall plants 87 tall (Tt) 79 short (tt) 1.10:1 

(Tt) X short 

plants (tt) 

TOTAL 561 541 1.04:1 


Mendel’s test-cross results validate two compo- 
nents of his segregation hypothesis. First, the results 
show that F; plants with the dominant phenotype have 
a heterozygous genotype. Second, the results validate 
the proposal that chance determines the frequency of 
gametes containing each allele. Had Mendel been in- 
correct about the heterozygous genotype of the Fj, or 
incorrect about the role of chance in producing the fre- 
quency of alleles in gametes, the result of the test cross 
would be different. If the round-seed plant were homo- 
zygous RR rather than Ry, all of the progeny of the cross 
would have the Rr genotype and would produce round 
peas. If the placement of alleles into gametes was not 
random, the phenotypes of test-cross progeny would 
not display a 1:1 ratio. 


Hypothesis Testing by F, Self-Fertilization 


A second pivotal component of Mendel’s segrega- 
tion hypothesis concerns the genotypes of F, progeny. 
Specifically, Mendel’s hypothesis predicts that F plants 
with the dominant phenotype can be either homozygous 
or heterozygous. His hypothesis further predicts that the 
plants are twice as likely to be heterozygous as homo- 
zygous. Look at Figure 2.6, for example, and notice that 
one-half of the F, progeny are heterozygous, whereas one- 
quarter of the F, progeny are homozygous for the domi- 
nant allele. Thus, among F, plants with the dominant 
phenotype (ie., excluding F, plants with the recessive 
phenotype), two-thirds of the plants are heterozygous and 
one-third are homozygous for the dominant allele. 

Mendel used a self-fertilization experiment to test 
the validity of his proposal that heterozygotes and ho- 
mozygotes occur at a 2:1 ratio among dominant F, plants 
(Figure 2.8). He reasoned that self-fertilized F, plants 
could be identified as homozygous if they produced only 
progeny with the same phenotype. In contrast, self-fer- 
tilization of heterozygous F, plants with the dominant 
phenotype would produce some progeny with the domi- 
nant phenotype and a smaller number with the recessive 
phenotype, in a 3:1 ratio. 

Mendel tested his segregation hypothesis by self- 
fertilizing F, plants of the dominant phenotype, exam- 
ining the progeny of each of these self-fertilizations to 
determine whether they exhibited the dominant pheno- 
type only or both phenotypes. The results of his seven 
F, dominant self-fertilization experiments are shown in 
Table 2.3. Mendel’s largest sample was for seed shape; he 
self-fertilized 565 round-seeded F, plants. In this experi- 
ment he found that 193 of the plants (34.2%) produced 
only round peas in progeny, demonstrating that these 
plants are homozygous for the dominant allele (RR). Self- 
fertilization of the other 372 round-pea—producing F 
plants (65.8%) produced both round peas and wrinkled 
peas in progeny plants. The ratio 372:193 is very close to 
the 2:1 ratio of heterozygous to homozygous genotypes 
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Among the F, plants with the dominant 
phenotype, 4 had F; progeny with only the 
dominant phenotype, and 3 had both 
dominant and recessive phenotypes. 


Figure 2.8 Determination of the genotype of F, plants 
by the production of F3 progeny. F, plants are self-fertilized 
and their seeds are scored. Among the dominant (round) F>, 
approximately one-third are expected to be homozygous for 
the dominant allele (RR). These plants produce progeny that 
have only round peas. The remaining two-thirds of the domi- 
nant F are expected to be heterozygous, and produce both 
round and wrinkled peas in progeny. All Fə wrinkled peas are 
homozygous recessive (rr) and produce only wrinkled peas 
as progeny. 


that Mendel predicted would constitute the dominant, 
round-pea—producing F, plants. 

Mendel’s self-fertilization results consistently show 
a 2:1 ratio among dominant F, plants for each of 
the seven traits examined. These results validate the 
proposal that gametes unite at random to produce 
progeny. Taken together, the test-cross experiments 
and the dominant F, self-fertilization experiments rep- 
resent successfully designed and executed indepen- 
dent experiments for testing components of Mendel’s 
segregation hypothesis. In these tests, Mendel made 
predictions about the experimental outcomes and then 
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Table 2.3 Results of Mendel’s Experiments to Identify F,-Plant Genotypes by Their F Progeny 
Trait’ Heterozygous F, Plants? Homozygous F, Plants‘ Ratio" 
Seed shape 372 193 1.93:1 
Seed color 353 166 i 2.13:1 i 
Flower color 64 36 1.78:1 
Pod shape 71 29 2.45:1 
Pod color 125 75 1.67:1 
Flower position 67 33 2.03:1 
= Plant height 72 28 2.57: 
= TOTAL 1424 560 2.01:1 


a Mendel self-fertilized only F plants with the dominant phenotype in this experiment. 


°F, plants were heterozygous if the F progeny they produced by self-fertilization had both dominant and recessive phenotypes. 
€ F, plants were homozygous if the F progeny they produced by self-fertilization had only the dominant phenotype. 


‘The expected ratio of heterozygous to homozygous F> plants was 2.00:1. 


verified the results by counting the progeny produced. 
The resulting data supported his segregation hypoth- 
esis and illustrate how Mendel anticipated modern 
scientific methods, using approaches that would not be 
consistently applied to genetic experiments for several 
decades (Genetic Analysis 2.1). 


2.3 Dihybrid and Trihybrid Crosses 
Reveal the Independent Assortment 
of Alleles 


Each of the seven traits investigated by Mendel showed the 
same pattern of hereditary transmission that is explained 
by the law of segregation. The uniformity of pheno- 
type proportions in F4, F, test-cross, and self-fertilization 
progeny suggests that the same mechanism is responsible 
for allelic segregation in each one of the selected traits, but 
what about the inheritance of two or more traits simul- 
taneously? Is there a pattern or ratio of phenotypes that 
allowed Mendel to propose a transmission mechanism 
when two or more genes are examined at the same time? 


Dihybrid-Cross Analysis of Two Genes 


To test the simultaneous transmission of two traits in the 
pea plant, Mendel performed a series of dihybrid crosses, 
crosses between organisms that differ for two traits. These 
tests followed an experimental strategy that paralleled his 
investigation of allelic segregation of single traits. 

As Figure 2.9 illustrates, Mendel began each dihybrid 
cross with pure-breeding lines. Having determined, for 
example, that round pea shape is dominant to wrinkled 
shape and that yellow pea color is dominant to green 
color, Mendel proposed that pure-breeding plants produc- 
ing round, yellow peas have the genotype RRGG and that 
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yellow green 
RRGG rrgg 
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Figure 2.9 Dihybrid-cross analysis. Parental plants that 
are pure-breeding for two traits are cross-fertilized to produce 
F, progeny that are dihybrid and display the two dominant 
phenotypes round and yellow. 


pure-breeding plants for the recessive phenotypes wrinkled 
and green have the genotype rrgg. Gametes produced by 
the round, yellow plant contain one allele for each type of 
gene and are RG. In contrast, gametes from the wrinkled, 
green plant are rg. Mendel’s model predicts that all of the 
F, progeny will therefore have the dihybrid genotype RrGg. 
These F; are heterozygous for two traits and display the 
dominant parental phenotypes round and yellow. 

Heterozygous F; dihybrids (RrGg) have received alleles 
R and G from the round, yellow pure-breeding parent and 
alleles r and g from the pure-breeding wrinkled, green par- 
ent. If the assortment of alleles for each type of gene is in- 
dependent, gametes produced by these F plants are equally 
likely to contain any combination of one allele for seed 
shape and one allele for seed color. Probabilities of each 
combination of alleles for each type of gene are predicted by 
recognizing that four combinations of alleles will be found 
in the gametes—RG, Rg, rG, and rg—and that each combi- 
nation is expected to occur with a frequency of +. 


GENETIC ANALYSIS 


PROBLEM The presence of short hairs on the leaves of tomato plants is 


HH or Hh; smooth-leaf plants are hh (p. 31) 
Examine the distributions of phenotypes in the progeny of each 


Cross Number of Progeny 
a dominant trait controlled by the allele H. The corresponding recessive alee eared 
trait, smooth leaf, is found in plants with the genotype hh. The table at uit da ideas 
right shows the progeny of three independent crosses ofjparental plants 1 32 11 
with genotypes and phenotypes that are unknown. 2 42 45 
BREAK IT DOWN: Dominant and reces- 
sive alleles dictate that hairy-leaf plants are 3 0 24 


BREAK IT DOWN: Phenotype ratios 
among progeny identify the genotypes of 


cross, and determine the parental genotypes for each cross. Use a parents in a cross (p. 33) 
Punnett square to diagram Cross 1. = | BREAK IT DOWN: Use a Punnett square to accurately orga- 
nize gamete production and gamete union (p. 34) 


Solution Strategies Solution Steps 


Evaluate 

1. Identify the topic this problem ad- 1. The problem presents the leaf-form phenotypes of progeny produced by three 
dresses and the kind of information separate crosses of parental plants with unknown genotypes and phenotypes. 
the answer should contain. The answer must identify parental genotypes and phenotypes for each cross 

2. Identify the critical information given and use a Punnett square to diagram Cross 1. 
in the problem. 2. The information given for each cross is the number of progeny with hairy (domi- 

nant) and smooth (recessive) leaves. Interpretation of the phenotype ratio of 
TIP: The numbers of progeny witheach) Progeny is required to determine parental genotypes and phenotypes. 
Deduce phenotype can be expressed as a ratio. 
3. Examine the progeny of Cross 1, 3. Ratio of phenotypes in Cross 1 progeny: 


and determine the approximate 


PITFALL: Genetics experiments produce finite num- 32 
ratio of progeny phenotypes. bers of progeny, so phenotypes may vary from expected — = 2.91:1 
ratios. Don’t expect to see precise ratios in real data. 11 


This is an approximate 3:1 ratio. The recessive phenotype appears in about i of 
the progeny (4), and the remaining 3 (33) have the dominant phenotype. 


4. Examine the progeny of Cross 2, and 4. Ratio of phenotypes for Cross 2: 
determine the approximate ratio of 
progeny phenotypes. 1 0.93:1 
45 


This is an approximate 1:1 ratio in which the dominant phenotype is seen in 


about one-half of the progeny (2) and the recessive phenotype is seen in the 


other half of the progeny (3). 


5. Examine the progeny of Cross 3, and 5. Cross 3 produced only the recessive phenotype, so the ratio is 0:1. 
determine the approximate ratio of 
progeny phenotypes. 

Solve 

6. Based on the results of Cross 1, iden- 6. The recessive progeny in this cross have the genotype hh, so each parent in 
tify the genotypes and phenotypes Cross 1 must carry a copy of h. The dominant progeny are either HH or Hh. 
of the parental plants in the cross. The 3:1 progeny phenotype ratio is consistent with a parental cross Hh X Hh. 
Construct a Punnett square to illus- The Punnett square for this cross is consistent with the H h 
trate this cross. observed 3:1 ratio: 

TIP: There are two alleles for this gene, and three genotypes are possible. H | HH | Hh 

The recessive phenotype is found in plants with the hh genotype, whereas Hh | hh 

the dominant phenotype will be found in plants that are Hh and HH. 

7. Based on the results of Cross 2, iden- 7. Both parental plants in Cross 2 carry at least one copy of h. The 1:1 progeny ratio 
tify the genotypes and phenotypes of is consistent with the ratio expected for a test cross of a heterozygous organism 
the parents. to one that is homozygous recessive. This cross is Hh X hh. 

8. Based on the results of Cross 3, 8. Cross 3 produces only hh progeny. This is expected for a pure-breeding cross 
identify the parental genotypes and between two homozygous organisms. This cross is hh X hh. 
phenotypes. 


For more practice, see Problems 10, 14, and 29. Visit the Study Area to access study tools. MasteringGenetics™ 
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Figure 2.10 The forked-line method for determining 
gamete genotype frequency. Chance is responsible for the 
independent assortment of alleles included in four genetically 
different gametes. 


Figure 2.10 shows a diagrammatic aid called the forked- 
line diagram that is used to determine gamete genotypes 
and frequencies. The forked-line diagram illustrates that 
one-half of all gametes produced by an RrGg plant will con- 
tain R and one-half will contain r. If the segregation of G and 
g is independent of the R and r alleles, then one-half of the 
gametes containing R will also carry G and the other half will 

carry g. The same is true for r-bearing gametes; one-half will 
carry G and the remaining half will carry g. The frequency of 
each of the four gamete genotypes is G l ) ( x) =i 

A Punnett square can be used to illustrate the ran- 
dom union of these four different gametes to produce 
F progeny (Figure 2.11). Each gamete has a predicted 
frequency of $, and each cell of the Punnett square has a 
predicted frequency of (4) (3) = Ł. Among F, progeny, 
four phenotypes are observed, displaying either (1) both 
dominant phenotypes, (2) the dominant phenotype for 
one trait and the recessive phenotype for the other (there 
are two versions of this), o r (3 ) 3) both ı recessive phenotypes. 


By examining the F phenotype Proponas, we can 
see the relationship between the 3:1 ratio for each trait 


Punnett square Summary 
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o ie. o fer (seco 
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@/e/e/e| |"! o 
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Figure 2.11 Independent assortment of alleles at two loci. 
Self-fertilization or crossing of dihybrid F, (RrGg) to one another 
produces nine genotypes distributed in a 9:3:3:1 phenotypic 
ratio among F, progeny. 


and the 9:3:3:1 ratio when the two traits are considered 
simultaneously. When pea shape and pea color are consid- 
ered individually, monohybrid crosses produce F, that are 
? dominant and ; recessive. The cross of two dihybrids also 
yields proportions of 3 dominant to § recessive for each trait, 
making the prediction of phenotypic ratios among the F, for 
both traits combined a problem of combinatorial arithmetic. 
Figure 2.11 reminds us that genotypes falling into the R- and 
the G- classes each occur in } of the progeny, while rr and 
gg genotype classes each occur in + of the progeny. The dash 
in the genotypes R- and G- is a “blank” that could be filled 
by either a second copy of the dominant allele or a copy of 
the recessive allele. In either case, the resulting genotype— 
for example, RR or Rr—produces the dominant pheno- 
type. The co-occurrence of the two dominant phenotypes 
(round, yellow) is therefore expected to have a frequency 
of (3) (3) = % the two recessive phenotypes (wrinkled, 
green) will occur with a frequency of (4) (4) = Ł, and the 
two phenotypic classes that display one dominant and one 
recessive trait (round, green and wrinkled, yellow) will each 
be found in a frequency of (3) (4) = % 

This outcome illustrates Mendel’s law of indepen- 
dent assortment, also known as Mendel’s second law. 


The law of independent assortment During gamete 
formation, the segregation of alleles at one gene is inde- 
pendent of the segregation of alleles at another gene. 


Mendel reached his conclusions regarding indepen- 
dent assortment on the basis of numerous dihybrid crosses. 
The cross of pure-breeding round, yellow plants with 
pure-breeding wrinkled, green plants was an instrumental 
one. After crossing the pure-breeding parents and allowing 
self-fertilization of the F;, Mendel counted the phenotypes 
among the F, and found that both of the parental pheno- 
types (round, yellow and wrinkled, green) were present 
along with two nonparental phenotypes: round, green and 
wrinkled, yellow. Among the F produced in his experi- 
ment, Mendel found 315 round, yellow plants; 108 round, 
green plants; 101 wrinkled, yellow plants; and 32 wrinkled, 
green plants (Figure 2.12a). 

This F, observation contains two features of piv- 
otal importance to Mendel’s hypothesis. First, parental 
and nonparental phenotypes are seen at frequencies that 
differ from one another. The most numerous class of 
F progeny display the dominant parental phenotypes 
for each trait, round and yellow. The smallest class of 
F progeny have the two recessive parental phenotypes, 
wrinkled and green; and the two nonparental F, classes 
(round, green and wrinkled, yellow) are intermediate and 
approximately equal in number. From these numbers, 
Mendel recognized that the ratios between the dominant 
and recessive forms of each trait followed the familiar 
3:1 pattern. In looking at pea shape, for example, Mendel 
found that 423 (315 + 108) plants were round and that 
133 (101 + 32) plants were wrinkled. The ratio 423:133 
reduces to 3.18:1. Similarly, for pea color he found a ratio 
of 416 (315 + 101) yellow to 140 (108 + 32) green—a 
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315:108:101:32 = 9.84:3.38:3.16:1 or 9:3:3:1 


(b) 
F, phenotype ratio by trait: 
a)Round 315+108=423 


Wrinkled 101+ 32=133 
423:133 = 3.18:1 or 3:1 
b) Yellow 315+101=416 
Green 108+ 32=140 
416:140 = 2.97:1 or 3:1 


For each trait, there is a 
3:1 F, phenotype ratio. 


Figure 2.12 Phenotype proportions in the progeny of a di- 
hybrid cross performed by Mendel. (a) When the two traits are 


considered simultaneously, a phenotypic ratio of 9:3:3:1 is expected. 


(b) For each trait considered individually, progeny display an ap- 
proximate 3:1 ratio of the dominant to the recessive phenotype. 


ratio of 2.97:1 (Figure 2.12b). Considering each trait indi- 
vidually, the cross of heterozygous F; plants has produced 
an F, generation in which ł of the progeny have the domi- 
nant phenotype and + have the recessive phenotype. 
Second, Mendel predicted that if alleles at each gene 
unite at random to produce the F, then the expected 
F-plant phenotypes will occur in predictable frequen- 
cies. He hypothesized that Fj progeny displaying the two 
dominant traits (round and yellow) will occur at a fre- 
quency of (3) (2) = È. Similarly, progeny carrying the 
two recessive traits (wrinkled and green) are expected at 
a frequency of ( i) ( x) = Ł, and each of the nonparental 
phenotypes is expected at a frequency of (3) (4) = 2. 
Independent assortment of alleles at the two genes there- 
fore leads to an expected distribution among the F, of 


round, yellow R-G- in 
round, green R-gg fa 
wrinkled, yellow rrG— 3 
wrinkled, green rrgg k 


Mendel’s count of 315 round, yellow; 108 round, 
green; 101 wrinkled, yellow; and 32 wrinkled, green 
(see Figure 2.12) can be converted to a ratio by divid- 
ing each number by 32, the value of the smallest class. 
The division by 32 reduces Mendel’s observed ratio to 
9.84:3.38:3.16:1, which is a close fit to the 9:3:3:1 ratio 


predicted by his model. From this result, Mendel hypoth- 
esized that independent assortment in a dihybrid organ- 
ism produces four different gamete genotypes at equal 
frequencies. Random union of the gametes then produces 
four phenotypic classes as a result of dominance relation- 
ships at each locus, and the ratio of these F} phenotypic 
classes is expected to be 9:3:3:1 (Genetic Analysis 2.2). 


Testing Independent Assortment by 
Test-Cross Analysis 


To test his hypothesis that combinations of pea shape and 
color are determined by the independent assortment of 
alleles, Mendel once again turned to test-cross analysis. 
Having proposed that the F, plants with round, yellow 
seeds were dihybrid and had the genotype RrGg, he pre- 
dicted that the test cross of a dihybrid (RrGg) to a pure- 
breeding wrinkled, green plant (rrgg) would produce four 
offspring phenotypes at a frequency of $ each. Figure 2.13 
shows that the dihybrid F; plant was expected to produce 


Pure Pure 
RRGG rrgg 


P@ «x ®@ 
\ 
Cross-fertilization 


Heterozygous Pure 
RrGg rrgg 


h@ x @ 
l | 
Cross-fertilization Frequency among 

Mendel’s 207 plants 


Expected Observed 


F, rg 
IRG | IRrGg 

round 0.25 

yellow 


55 (0.266) 


aRg | 4Rrgg 
round 0.25 
green 


arG | 4 rrGg 
wrinkled} 0.25 
yellow 


arg | 3 rrgg 
wrinkled} 9.25 


green 1.00 


Test-cross progeny are observed to display 
four phenotypes in equal frequencies as 
expected by application of Mendel's laws. 


51 (0.246) 


49 (0.237) 


_52 (0.251) 
207 (1.000) 


Figure 2.13 Mendel’s test cross to verify independent 
assortment. Mendel predicted and observed an approximate 
1:1:1:1 ratio among progeny, supporting his hypothesis of 
independent assortment. 


GENETIC ANALYSIS 


PROBLEM In a certain mammalian species, long fur and the appearance of 


Male Female 
white spots are produced by dominant alleles F and S, respectively, which assort 
independently. The genotype ff produces short fur, and the genotype ss produces Cross1: FFSs X Ffss 
solid fur color. Given the parental genotypes for each of the following crosses, Cross2: ffSs X FfSs 
determine the expected proportions of all progeny phenotypes. Cross3: FFSs X  FfSs 


BREAK IT DOWN: If genes assort independently, fur length 
will be independent of the presence or absence of spots (p. 38). 


Solution Strategies Solution Steps 


BREAK IT DOWN: Use a Punnett square or a forked- 
line diagram to accurately predict cross outcomes (p. 39). 


Evaluate 


1. Identify the topic of this problem and 
the kind of information the answer 
should contain. 


1. This is a transmission genetic problem in which parental genotypes are given. 
Answers must predict the phenotypes of progeny and their expected propor- 
tions. These are predicted by determining the parental gametes and their 


proportions. 
2. Identify the critical information given 2. Genotypes of parents are given for each cross. The genotypes are used to 
in the problem. predict the genotypes of parental gametes and the gamete proportions. 
Deduce 
3. For Cross 1, identify the genetically 3. Each of the parents can Cross 1 


different gametes that can be pro- produce two genetically Male Female 
duced by each parent and calculate different gametes at pre- 1S....FS.---(1)(4) =4 1F — 1s.---Fs---(4)(1) =4 
z š g : 1 
the predicted proportion of each dicted frequencies of 5 1 1s- Fs--(1)(4) =! 1f — 1s--fs-(4(1) =} 
gamete. each. 
TIP: A forked-line diagram is a useful tool for predict- 
ing the alleles in gametes and gamete frequencies. 
4. Identify the content and frequency 4. The male produces two Cross 2 
of the genetically different gametes types of gametes at a Male Female 
produced by the parents in Cross 2. predicted frequency of 1S... f5----(1)(4) =4 i 1§....FS...(1)(1) =1 
3 each. The female pro- Wj ; 1y-1 2 15.. fg....(1)(1) —1 
- — 2 aso fS---(1)(3) = 3 se FS-(5)(5) =4 
PITFALL: Carefully identify the geno- duces four genetically 1g... fS--(1)(1) =} 
type of each parent to avoid errors. , 1 25° fS (5)(3) =j 
different gametes at fre- ee Lecce tint 
; 1 2 2/\2/ 74 
quencies of each. 
5. Predict the gamete content and fre- 5. Both parents are dihybrids Cross 3 
quencies for the parents in Cross 3. that produce four geneti- Male Female 
cally different gametes at 1S- FS- =} 1S- FS--(4)(4) =} 
frequencies of 3 each. F< 1s- Fs- (9) =] 1F < 1s- Fs--()() =} 
15... f5-(3)(1) =} 15... f§---(1)(4) = 1 
1 2 2/\2 4 1 2 2/2 4 
PE ooops fS sm ts) =} 
Solve 3 OFS Fs 
6. Construct a Punnett square for 6. The predicted Cross 1 progeny are } long, spotted Fs |FFSs|FFss 
Cross 1 and predict the progeny phe- and 5 long, solid. 
notypes and proportions. fs | FfSs | Ffss 
7. Construct a Punnett square for 7. The progeny predicted from Cross 2 are 2 long, spotted; ofS fs 
Cross 2 and predict the progeny phe- ł long, solid; short, spotted; and 3 short, solid. f°) 
notypes and proportions. FS |FfSS| FfSs 
Fs | FfSs | Ffss 
ors Fs fS fs fs |ffSS|ffSs 
8. Construct a Punnett square for 8. The progeny produced by Cross3 FS |FFSS|FFSs|FfSS|FfSs 
: : 9 fs | ffSs | ffss 
Cross 3 and predict the progeny phe- are predicted to be 7 long, es | piel Bex eke |e 
notypes and proportions. spotted; 3; long, solid; & short, el ical cl esa cia 
spotted; and 7¢ short, solid. fS |FFSS| FFSs| ffSS | ffSs 
fs | FfSs | Ffss | ffSs | ffss 


For more practice, see Problems 6, 12, and 27 
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2.3 Dihybrid and Trihybrid Crosses Reveal the Independent Assortment of Alleles 


four different gamete genotypes. Recalling the logic of the 
forked-line diagram, remember that one-half of the gam- 
etes are expected to contain R and one-half to contain r. 
Gametes carry G and g independently of R or r, mean- 
ing that four different combinations of these alleles are 
possible in gametes: RG, rG, Rg, and rg, each occurring at 
an expected frequency of G) (3) = §. In contrast, the 
homozygous recessive green, wrinkled (rrgg) plant can 
produce only an rg gamete. In the figure, we see that the 
test-cross progeny are expected to have four genotypes, 
each corresponding to a different phenotype. The pre- 
dicted progeny are expected to be + RrGg (round, yellow), 
i Rrgg (round, green), 4 rrGg (wrinkled, yellow), and § rrgg 
(wrinkled, green). 

Mendel performed this cross, and his results almost 
exactly matched expectation. He found that the 207 test- 
cross progeny were composed of 55 round, yellow; 51 
round, green; 49 wrinkled, yellow; and 52 wrinkled, green 
plants. This result confirmed the dihybrid genotype of the 
F, plant and supported the hypothesis that alleles for pea 
shape assort independently of those for pea color during 
gamete formation and that gametes unite at random to 
form offspring. 


Pure-breeding parents 
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Testing Independent Assortment by 
Trihybrid-Cross Analysis 


Mendel further tested the hypothesis of independent as- 
sortment by examining the results of a trihybrid cross, 
a cross involving three traits—in this case, seed shape, 
seed color, and flower color. He began this experiment by 
crossing a pure-breeding round, yellow, purple-flowered 
parental plant (RRGGWW) to a pure-breeding wrinkled, 
green, white-flowered plant (rrggww). Figure 2.14 illus- 
trates the cross of pure-breeding parental strains and the 
resulting Fı progeny, which display the dominant phe- 
notypes round, yellow, and purple. The F, are presumed 
to be trihybrid (RrGgWw). The presumptive trihybrid F, 
plants were then crossed to produce F plants, and the 
results were compared to expectations. 

The forked-line diagram in Figure 2.14 shows the 
number and expected frequency of gamete genotypes. In 
the general case, for example, assuming there are two al- 
leles for each gene, the number of different gamete geno- 
types is expressed as 2”, where n = the number of genes 
involved. In this example, there are three genes (n = 3), 
and 2° = 8 different combinations of alleles possible for 


Frequency among 
RRGGWIW dalled F, progeny Mendel’s 639 plants 
è r Flower color Phenotype Frequency Expected Observed Phenotype 
P x a 3 (round) round 
© & Seed color R-G-Ww- CRS ..... (yellow) --- G)G)G) = Zl- 269.6 -111+ 269.01120 yellow 
Gamete formation 5 è (purple) purple 
1 (round) round 
RGW rgw Seed “a (=) Sieg 3 Wha sists (yellow) ----- HAQ = et BIGi OBa yellow 
te as (white) white 
Fertilization = = 5» 4 __ 
| © 3 (round) round 
g : R-ggW- TER ----- (green) Gaini OEA) = & EREE N BO Greni BO sieesssiests green 
Trihybrid bS j 
RrGgWw ve B F (purple) purple 
© 1 (round) round 
R-ggww Wa aes (green) hiss (EELE) = å ETEA I 29:9 ieivisiess D y EEEO green 
F, (= (white) white 
E 3 (wrinkled) wrinkled 
Seed color rrG-w- Ga... (yellow) HGG) = gl 89.9 vaaan. BB -aonana yellow 
@ ‘gat: Ca (purple) purple 
, . (wrinkled) wrinkled 
G A Seed va & N E Wa saps (yellow) ----- DAQ = åt JO Ginarsa ey PER yellow 
eer (white) white 


rr---- 


3 z 
4 
(5) << rrggW- P bake (green) ----- BAD = Fee fee QOD PERRONI BO EE green 
I P (purple) purple 
b rrggww wd 


rrgg-- 


(wrinkled) 


(wrinkled) wrinkled 
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Figure 2.14 Trihybrid cross to verify independent assortment. The forked-line method can be 
used to determine the expected phenotype frequencies produced by a trihybrid cross. Expected and 
observed results for the F, generation of Mendel's trihybrid-cross experiment supported his hypothesis 


of independent assortment. 
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the three traits in gametes from the trihybrid plant. The 
frequency of each gamete genotype is determined as ( 5) r 
or G 3 = L To predict the number of genetically dif- 
ferent gametes and their frequencies, the exponent 3 is 
used because there are three genes being examined in the 
experiment. In arithmetic computations like these, the 
exponent value usually indicates the number of genes. 

Figure 2.14 illustrates a way of using the forked-line 
method to predict the expected frequency of the eight 
phenotypic classes of this trihybrid cross. For the general 
case where there are two phenotypes (dominant and reces- 
sive) for each trait, there are 2” phenotypes in the Fy. Once 
again, n = the number of genes. In this example, there 
are 2° = 8 phenotypes in the F progeny. Computation 
of each expected phenotype frequency is based on the ex- 
pected frequencies of ł dominant and recessive for each 
trait. The expected frequency of each trihybrid class is 
the product of three fractions representing the predicted 
probabilities of the dominant or recessive form for each 
trait. For the eight F, phenotypes from a trihybrid | cross, 
the expected phenotype ratio is 7:4: @:4:a:aiaa: 

Mendel used this combinatorial thinking to predict 
the outcome of an experimental trihybrid cross. His ex- 
perimental results for this test are given in Figure 2.14 for 
639 F, progeny from the cross of round, yellow, purple- 
flowered F, plants. Mendel predicted the number of 
progeny expected in each phenotype class by multiplying 
the expected proportion times the sample size, 639. His 
results were remarkably close to expectation. The close 
match of these observed and expected values provides a 
second piece of independent evidence supporting the law 
of independent assortment. 

Taken together, Mendel’s analyses of the transmis- 
sion of single traits and the joint transmission of two or 
three independent traits represented a major advance in 
the scientific understanding of hereditary transmission. 
The law of segregation and the law of independent as- 
sortment are the most fundamental principles of genetic 
transmission in diploid organisms, and they form the 
foundation of our understanding of transmission, mo- 
lecular, and population and evolutionary genetics. 


Probability Calculations in Genetics 
Problem Solving 


The predicted F,-phenotype ratio from a trihybrid cross 
seems complicated, and at first you might not see clearly 
why that is the expected distribution. The key to under- 
standing the calculation demonstrated in Figure 2.14 is to 
realize that each independently assorting locus truly can 
be treated independently of others. 

Let’s look at the progeny-phenotype distribution 
for a dihybrid cross. We expect that for each trait in- 
dividually, ł of the progeny will display the dominant 
phenotype and § the recessive phenotype. We could use a 
Punnett square to determine the phenotypic distribution 


of the two traits in combination as we did in Figure 2.11, 
but the independence of each gene gives us a quicker 
way to calculate the distribution of phenotypes: by their 
probability. In this case, the expected progeny pheno- 
type proportions can be obtained by multiplying the 
two ratios— (3:4) (3:4) to yield the expected ratio of 
E34, or 9:3:3:1. We can use the same approach to 
predict the ratio among F, progeny of a trihybrid cross 
as well. Taking an example from Figure 2.14, notice 
that the expected proportion of any F phenotype class 
can be predicted by the probability method. For the 
round, yellow, purple class, the predicted proportion 
is } xX 7 x {= 4%, and for round, yellow, white it is 
3 x łx 4 = & Using the probability method can save 
time and reduce the chance of an error in predicting out- 
comes for more complex crosses. 

Another advantage to using probability for solving 
genetic problems is its easy adaptability to different sorts 
of questions. For example, what proportion of progeny 
produced by self-fertilization of a trihybrid yellow, round, 
purple plant (GgkrWw) will have the same genotype as 
the parental plant? To determine the answer, we identify 
the probability of the genotype for each individual trait 
and then multiply those three probabilities together. At 
each locus the cross is heterozygous by heterozygous, 
so one-half of the progeny are expected to be heterozy- 
gous. The probability that offspring of a trihybrid self- 
fertilization will be trihybrid is therefore (4) (3) (3) =i. 
If we wanted to determine the proportion of progeny 
from the trihybrid cross that are rrGGWw, we again treat 
the loci independently and calculate the probability as 
(3) (2) (2) = 3 

The problems at the end of this chapter, as well as 
Genetic Analysis 2.3, provide a number of opportunities 
for you to practice using the principles of transmission 
genetics. As Experimental Insight 2.1 points out, however, 
opportunities to collect evidence of Mendel’s laws of he- 
redity may be as close as the produce aisle of your local 
grocery store. 


The Rediscovery of Mendel’s Work 


In 1900, after remaining virtually unknown for 34 years, 
Mendel’s experimental results and interpretations were re- 
discovered almost simultaneously by three botanists work- 
ing independently of one another. Carl Correns and Erich 
von Tschermak both worked on Pisum sativum, the same 
plant Mendel had used, and Hugo de Vries worked on a 
different plant species. Each of the three identified the 
hereditary principles Mendel had first described in 1865. 
With support from the contemporaneous discoveries of 
the behavior of chromosomes during meiotic cell division, 
followed quickly by confirming evidence from other spe- 
cies of plants and animals, the basic principles of segrega- 
tion and independent assortment were widely and rapidly 
disseminated in the first decade of the 20th century. 


GENETIC ANALYSIS 


yi = 4 


PROBLEM For the same mammalian species and the same traits described in Genetic - : 
s ” é š . $ BREAK IT DOWN: Review the dominance rela- 

Analysis 2.2 which described dominance relationships between the alleles of each gene, a a gene (p.39). ) 

cross between a male that has long, solid-colored fur and a female that has short, spotted fur _==— 

produces eight offspring. The offspring are 2 long, spotted; 2 short, solid; 2 long, solid; and 2 p BREAK IT DOWN: The phenotype ratio among ) 


short, spotted. Given the phenotypes of the parents and the distribution of offspring pheno- of a cross identifies parental genotypes (p. 39). 


types, determine the genotypes of parents and offspring. 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic of this problem and 
the kind of information the answer 
should contain. 

2. Identify the critical information 
given in the problem. 


Deduce 


3. Record what is known about the 
parental genotypes by writing ho- / 
mozygous recessive alleles for the 
recessive trait and writing a dominant 
allele and a “blank” as a placeholder 
for the dominant trait. 


4. Infer what is known about progeny 
genotypes by writing homozygous 
recessive alleles or dominant alleles 
with a “blank” placeholder. 


5. Determine the phenotype ratio 
of long fur to short fur among the 
progeny of the cross. 


6. Determine the phenotype ratio of 
spotted fur to solid fur among the 


progeny. 


TIP: Use the known and placeholder genotypes for parents and progeny 
phenotype ratios to completely identify parental genotypes. 


1. The problem requires the determination of parental genotypes and progeny 
genotypes based on the phenotypes of parents and the proportions of progeny 
with different phenotypes. 

2. In this mammalian species, long fur is dominant to short fur and spotted fur 
color is dominant to solid fur color. Each parent is homozygous recessive for one 
trait and is dominant for the other trait. The progeny display a 1:1:1:1 ratio of 
phenotypes. 


PITFALL: You cannot presume to know the genotype of an organism with the domi- 
nant phenotype without segregation information. Use general genotype forms F- and 
S- as placeholders for the homozygous dominant or heterozygous genotypes. 


3. The long, solid parent is F-ss, carrying at least one dominant (F-) allele for long 
fur and homozygous recessive alleles (ss) for solid coat. The short, spotted par- 
ent is ffS-, carrying homozygous recessive alleles (ff) for fur length and at least 
one dominant allele (S-) for spotted coat. 


4. The inferred progeny genotypes are 
F-S- long, spotted 
ffss short, solid 
F-ss long, solid 
ffS- short, spotted 
5. Four long fur and four short, a 1:1 ratio of dominant and recessive phenotypes. 


TIP: Traits assorting independently can be analyzed individually. 
Assess segregation based on progeny phenotype ratios for one trait 
ata time. 


6. Four progeny have spotted fur and four have solid fur, a 1:1 ratio of phenotypes. 


Solve 


7. Determine the parental genotypes 
necessary to produce progeny with the 
observed ratio of long to short fur. 


8. Determine the parental genotypes 
necessary to produce the observed 
ratio of spotted to solid coat. 


9. Verify the parental genotypes in 
this cross by using a Punnett square 
analysis and the forked-line method 
to predict phenotype probabilities. 


PITFALL: To avoid errors, use a Punnett 
square or a forked-line diagram to verify 


that the parental genotypes you assign will 
produce progeny in the observed ratio. 


For more practice, see Problems 3, 16, and 40. 


7. To produce the recessive short fur phenotype, each parent 


must contribute a recessive (f) allele. The female parent g Fs fs 
with short fur is ff, and the male parent with long fur must fs| FfSs ffSs 
be heterozygous (Ff) for this gene. The genotype of the Long, | Short, 
male parent with long, solid-colored fur is Ffss. spotted | spotted 
8. The male parent with the recessive phenotype solid coat ¢, | Ffss ffss 
contributes a recessive (s) allele. The female parent with Long, | Short, 
spotted coat must be heterozygous (Ss). The short, spot- solid solid 


ted female has the genotype ffSs. 


9. For the cross Ffss X ffSs, each parent produces two genetically different gametes 
at frequencies of } each. For each gene, a heterozygous genotype is crossed with 
a homozygous recessive genotype, resulting in 1 
a 1:1 ratio of dominant to recessive phenotype} fS <7 
for each trait. The Punnett square predicts four 
different progeny genotypes and phenotypes a Fs=1Ffss Long solid 
in a 1:1:1:1 ratio, and the forked-line method 2 4 : 
gives the same result. pean onan 
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Fs=1FfSs Long spotted 
fs =} ffSs Short spotted 
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Experimental Insight 2.1 


Mendelism in the Produce Aisle 


Many of the appealing characteristics of fruits and vegetables 
available in grocery stores and at farmer's markets are the 
result of intensive selective breeding, a form of natural selec- 
tion generated by breeders, who select which organisms are to 
reproduce and determine the crosses that will occur. For exam- 
ple, in recent years many new vegetable varieties have been 
introduced into the marketplace. Among these is a variety of 
corn that goes by several names, including “bicolor,” “peaches 
and cream,” and “yellow and white.” Most of the kernels on 
a cob of bicolored corn are yellow, but a sizable number are 
white. With close inspection and a little quantitative analysis, 
you should be able to identify the genetic mechanism that 
produces this variation in color. 

An ear of corn is a mini-genetic experiment: Each kernel on 
the ear, like each pea in a pod, is a separate seed, produced by 
a fertilization event independent of the events that produced 


The approach to genetic analysis we describe in this 
chapter is often dubbed Mendelian genetics for the obvious 
reason that Gregor Mendel was the first scientist to offer 
a mechanism to explain the hereditary patterns he ob- 
served. However Mendel was not the first person to make 
these observations. Experimental Insight 2.2 shows why, but 
for a failure to quantify the results of his own crosses of 
pea plants, Charles Naudin, a contemporary of Mendel’s, 
might have been the first scientist to succeed at explaining 
heredity. And, you can be an experimental geneticist too! 
Experimental Insight 2.3 describes a genetics breeding pro- 
gram you could start right in your own community. 


2.4 Probability Theory Predicts 
Mendelian Ratios 


Mendel recognized that chance (or random probability, 
the same process that determines the outcome of coin 
flips and rolls of the dice) is the arithmetic principle un- 
derlying the segregation of alleles for a given gene and 


adjacent kernels. This means that each mature ear of corn car- 
ries hundreds of progeny for analysis. 

Bicolor corn originates with the cross of two pure-breeding 
corn lines, one producing yellow kernels and the other pro- 
ducing white kernels. The yellow plant is WW, and the white 
plant is ww. When seed company geneticists cross these pa- 
rental stocks, the kernels on the F; plants are yellow and have 
the heterozygous Ww genotype. This F, seed is allowed to ma- 
ture and is packaged for sale to farmers and home gardeners, 
who plant it to produce a crop. The seed is commonly labeled 
“hybrid,” meaning “monohybrid,” to reflect the heterozygosity 
at the kernel-color locus. Owing to segregation of alleles at the 
kernel-color locus, the plants that grow from this F,; seed pro- 
duce both yellow (W—) and white (ww) kernels on each ear. 

If you saw some of this corn in your grocery store, how 
would you verify that the genetic basis of its yellow and white 
kernels is the segregation of two alleles at a single locus? The 
answer is that you would count the number of yellow kernels 
and the number of white kernels on ears of bicolor corn with 
the expectation of a ratio of approximately 3:1 between the 
yellow and white kernels. 

Recent genetics classes of one of the authors have exam- 
ined several dozen ears of bicolor corn and counted 9304 
yellow kernels and 3052 white kernels. Among the total of 
12,356 kernels there are 75.3% yellow and 24.7% white, a ratio 
of 3.05:1. You will use these data in Problem 20 at the end of 
the chapter to do a statistical test to see if the observed data 
fit the hypothesis that this trait is the product of the segrega- 
tion of alleles at a single gene. The next time you shop for 
fruits and vegetables, keep in mind that you are looking at 
Mendelian genetics in action! 


governing the independent assortment of alleles for differ- 
ent genes. The preceding discussions have demonstrated 
that the basic rules of Mendelian inheritance are actually 
those of random probability theory. The Mendelian prob- 
abilities we have discussed to this point are most clearly 
expressed by four rules of probability theory—the product 
rule, the sum rule, conditional probability, and binomial 
probability. In this section, we look more closely at these 
rules that describe and predict the outcome of genetic 
events governed by the rules of chance. 


The Product Rule 


If two or more events are independent of one another, 
their joint probability, the likelihood of their simultaneous 
or consecutive occurrence, is the product of the probabili- 
ties of each one individually. The product rule, also called 
the multiplication rule, describes these circumstances. 
You have already used the product rule several times 
in determining the outcomes of genetic crosses. For ex- 
ample, in Figures 2.6 and 2.7 the product rule is used to 


Experimental Insight 2.2 


Naudinian Genetics, Anyone? 


Before Mendel, many “plant hybridists” experimented with 
pea plants and other plants, attempting to discern the mecha- 
nisms of plant reproduction and the process of hereditary 
transmission of traits. Mendel cited the work of several early 
hybridists in his 1866 paper. 

Several of these plant hybridists came close to discovering 
the hereditary principles that today bear Mendel’s name; none 
succeeded fully. For example, in 1823, Thomas Andrew Knight 
determined that gray seed coat is dominant to white and that 
self-fertilization of certain gray-seeded plants produces both 
gray and white seed in progeny plants. In 1822, John Goss, 
working with a pea variety that had blue and white seeds, re- 
ported that crossing a pure-breeding white-seeded plant with 
a pure-breeding blue-seeded plant produced only blue seeds in 
first-generation plants, and that self-fertilization then produced 
a second generation with a mixture of white and blue seeds in 
plants. Carl Friedrich Gaertner came tantalizingly close to ex- 
plaining segregation in 1827 when he reported results of a cross 
between pure-breeding gold-kernel maize and pure-breeding 
red-striped maize. All the F; had gold kernels, and among the 
F2, 328 plants had only gold kernels and 103 had red-striped ker- 
nels. If Gaertner had been able to correctly interpret his data, he 
would have identified a 3.18:1 ratio in the F>. Alas, he never did 
and missed his “golden” opportunity to explain simple heredity. 

Similar fates befell other plant hybridists, but arguably the 
one who came closest to explaining heredity prior to Mendel 
was Charles Naudin, who in 1863 seemed poised to beat 
Mendel to the punch by 2 years. In that year, Naudin reported 
the following: 


determine that the chance of producing an F, plant with 
the recessive phenotype by the cross of heterozygous 
F, plants that are Gg or Rr. The probability of produc- 
ing the recessive phenotype is (3) (3) = i in each case. 
Similarly, in Figure 2.10, the probability of a dihybrid 
organism producing gametes with each of the four differ- 
ent genotypes is predicted by applying the product rule 
in the forked-line diagram. Likewise, in Figure 2.11, the 
probability that F, offspring will be homozygous recessive 
for both traits from a cross of F; dihybrid plants with the 
genotype RrGg is predicted by applying the product rule. 


The Sum Rule 


The sum rule, also called the addition rule, defines the joint 
probability of occurrence of any of two or more mutually 
exclusive events by summing the probabilities of each event. 
This rule is applied when more than one outcome satisfies 
the conditions of the probability question. Mutually exclu- 
sive events in this context are alternative outcomes, only one 
of which can occur to the exclusion of the other outcomes. 
You applied the sum rule to several genetic cal- 
culations in the preceding section. For example, in 
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The results of reciprocal crosses are identical. 

(Similar observations by Mendel were important in his 
identification of the particulate nature of hereditary 
factors.) 


F, progeny display a single phenotype (as Mendel re- 
ported 2 years later). 


F» progeny display two phenotypes. (These observations 
are the result of the segregation of alleles.) 


The hereditary units for traits are separated in pollen and 
egg formation. (This concept was fundamental to the seg- 
regation observation of Mendel.) 


Nonparental combinations of phenotypes appear in the F, 
generation. (This is identical to Mendel’s independent as- 
sortment observation.) 


After making these observations, why wasn’t Naudin able 
to propose a hereditary mechanism to explain them? The 
answer is that Naudin, like his predecessors and others who 
would follow, failed to quantify his results. Naudin did not 
report the number of plants falling into different phenotypic 
categories, and he was therefore unable to recognize the 
ratios between phenotypic classes that are the key to inter- 
preting hereditary transmission. Without quantitative data, 
Naudin was unable to formulate a testable hypothesis. 

Alas, poor Naudin! Were it not for his failure to see the 
necessity of quantifying experimental results, we might well 
be discussing Naudinian genetics in this chapter instead of 
Mendelian genetics! 


Figure 2.6 the probability that F, progeny of the cross 
Gg X Gg will be heterozygous is determined by add- 
ing the chance of obtaining either of the two possible 
ways of obtaining offspring with the dominant pheno- 
type: (4) + (3) + (3) = 3 Similarly, in Figure 2.11, the 
probability that the F) progeny of the cross of dihybrid 
heterozygotes (RrGg) have the two dominant phenotypes 
is obtained by applying the sum rule. This probability is 


(is) + (5%) + (i) t (36) =(46). 


Conditional Probability 


Probability questions in genetic experiments can be asked 
before a cross is made, as when the product rule and the 
sum rule are used to predict the likelihood of obtaining a 
certain genotype or phenotype from a cross. Certain other 
probability questions are asked after a cross has been 
made, such as questions about the probability that an 
organism has a particular genotype given that the organ- 
ism has a particular phenotype. This kind of probability 
is called conditional probability, and it is applied when 
specific information about the outcome modifies, or “con- 
ditions,” the probability calculation. 
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Experimental Insight 2.3 


Genetics and Evolution at a Library near You? 


The Central Rocky Mountain Permaculture Institute (CRMPI) 
(www.crmpi.org) in cooperation with the Basalt Regional 
Library (www.basaltlibrary.org) in Basalt, Colorado, estab- 
lished an unusual vegetable-seed-lending program in 
early 2013. The vegetable seeds available to library patrons 
through the Basalt Seed Lending Library were collected by 
CRMPI Director Stephanie Syson through donations from seed 
companies across the United States. The seeds are all from 
“heirloom” or “open pollinating” vegetable varieties, pure- 
breeding plants that only produce progeny with the specific 
traits characteristic of the vegetable variety. If, for example, 
the seeds are for a bean plant that has bush (short) growth 
and green bean pods containing white seeds, then the plants 
resulting from those seeds and from seeds harvested for 
planting in successive years will all have bush growth, green 
pods, and white seeds. 

Seeds for beets, broccoli, melons, squashes, peas, toma- 
toes, various greens, and other vegetables available in the 
lending library offer a potentially bountiful harvest, but the 
Basalt Seed Lending Library is about more than just providing 


A genetic example of conditional probability would 
be to ask of the F, progeny of a cross like Gg X Gg, “What 
is the probability that yellow-seeded progeny plants are 
heterozygous Gg like the parents?” (Mendel asked this 
question in seeking to test his hypothesis of segregation; 
see Figure 2.6). Yellow seed is present in ? of the progeny, 
but this phenotypic class contains two genotypes, GG and 
Gg, that are not equally frequent. In this case, the genotype 
Gg is found in 4 of the yellow F, progeny. The other yellow 
F are GG. Under the conditional criterion that the only 
progeny phenotype considered is yellow seeds, the answer 
to the question posed earlier is that the yellow-seeded 
progeny of the cross have a 3 probability of being Gg. 

Another application of conditional probability is the 
question, “If the yellow-seeded F, are allowed to self- 
fertilize, what proportion of them are expected to breed 
true?” This question is similar to the one Mendel asked as 
he devised an independent test of his segregation hypoth- 
esis (see Table 2.3). True-breeding F} progeny must be 
homozygous, and in his seed-color experiment, only those 
progeny with the genotype GG meet this conditional 
contingency. Since the genotype GG is found in one-third 
of the yellow-seeded F, the same proportion of true- 
breeding plants is expected as a result of self-fertilization. 


Binomial Probability 


In determining the outcomes of certain genetic events, 
just one event need be predicted. An example is the ques- 
tion, “What is the chance a couple produces a daughter?” 
The answer is obtained by assuming that the father has a 
5 chance of donating an X chromosome and producing a 


free seeds. Library patrons who use the seeds are also asked 
to save seeds from plants that grow and produce well. Good 
vegetable-plant growth and production can be a challenge in 
the Basalt area. Located at approximately 6300 feet of eleva- 
tion in the shadows of the Rocky Mountains, Basalt has poor 
soil and a short growing season. CRMPI Director Syson and 
Basalt Regional Library Director Barbara Milnor run workshops 
to teach patrons how to properly save seeds for use the fol- 
lowing year. According to Syson and Milnor, the ultimate 
goals of the lending program are (1) to identify vegetable 
varieties that grow well in the Basalt area and (2) to produce 
strains of vegetables that are better adapted to conditions in 
the Basalt area by collecting and replanting seeds from the 
best-growing and best-producing plants each year. 

Only a few dozen libraries around the country have seed- 
lending programs like this one. Maybe a library near you will 
start one soon, or maybe you can help set one up. These pro- 
grams operate on a sound genetic and evolutionary basis, as 
you will discover in the process of answering Problem 49 at 
the end of this chapter. 


daughter and an equal 5 chance of donating a Y chromo- 
some to produce a son, and that male and female offspring 
are equally likely. In contrast, other questions concerning 
genetic outcomes require that we assess the probability 
of a combination or sequence of such events (events for 
which there are two or more possible outcomes each 
time). For example, determining the probabilities of dif- 
ferent combinations of boys and girls in sets of siblings 
or the risk of the recessive phenotype in one or more 
children of a couple who are each heterozygous carriers 
of a recessive disease requires computation of a particular 
combination of events that each have two alternative out- 
comes. To make these determinations, we use binomial 
probability calculations, expanding the binomial expres- 
sion to reflect the number of outcome combinations and 
the probability of each combination. 


Construction of a Binomial Expansion Formula A 
binomial expression contains two variables, each repre- 
senting the frequency of one of two alternative outcomes. 
We can express the likelihood of one outcome as having 
a frequency p and the alternative outcome as having a 
frequency q. Since the events p and q are the only outcomes 
possible, the sum of the two frequencies is (p + q) = 1. 
If we are examining the probabilities of the outcomes for 
a series of two alternative events, such as multiple flips 
of a coin or several successive children born to a couple, 
we can expand the binomial to the power of the number 
of successive events (n) to calculate the probabilities. The 
binomial expansion formula is written as (p + q )”. 

In some kinds of probability problems, the values 
of the binomial variables p and q will be equal; that is, 


p = q = 4}, as in the probability of producing a boy or a 
girl. In other cases, the two binomial values will not be 
equal, as in the probability that heterozygous parents will 
produce a child with a recessive trait i) versus a child 
with the dominant trait (3). Lets use combinatorial 
probability to predict the likelihood of different numbers 
of boys and girls produced when a couple has three chil- 
dren. A combinatorial approach allows us to list all the 
possible birth orders of boys and girls and to group them 
according to the total numbers of boys and girls in each 
set of three siblings. The following table shows that there 
are 2° or eight different birth orders of boys and girls. This 
conclusion is determined based on two possible outcomes 
(a boy or a girl) for three successive events. Assuming the 
probabilities of having a boy or having a girl are $, each 
different order has a probability of ($)? = }. The out- 
comes can be grouped into four sets that each contain a 
different total number of boys and girls. 


0 Boys 1 Boy 2 Boys 3 Boys 
3 Girls 2 Girls 1 Girl 0 Girls 
GGG GGB GBB BBB 
GBG BGB 
BGG BBG 
Probability: $ Z ž i 


We can see that there is only one order in which to get 
either three boys (BBB) or three girls (GGG), and each has 
a probability of $. Notice that we use the product rule to ob- 
tain each probability. But what about the cases of 2 boys and 
1 girl or 2 girls and 1 boy, where there are three different 
birth orders (the orders of boys and girls among the siblings) 
for each outcome? Here we recognize that each birth order 
has a probability of (4) 3 = Ł and that we must sum up all 
similar outcomes to determine the probability of 1 or 2 boys 
or girls in three consecutive siblings. In each of these cases, 
using the sum rule, the probability is (3) $ (3) + (3) =o 

Arithmetically, we use the binomial expansion to the 
third power [ (p+ q)? | to represent the three successive 
siblings. Assuming that the probability of one outcome is 


n (number of events) Binomial coefficients 


0 1 

1 1 1 

2 1 2 1 

3 1 3 3 1 

4 1 4 6 4 1 

5 1 5 10 10 5 1 

6 íl 6 15 2) 15 6 1 

7 1 7 21 35 35 21 7 

8 1 8 28 56 70 56 28 8 

9 1 9 36 84 126 126 84 36 
10 1 10 45 120 210 252 210 120 45 
11 1 11 55 165 330 462 462 330 165 
12 1 12 66 220 495 792 924 792 495 220 


55 
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p and the probability of the other outcome is q, then the 
general case for the binomial expands as follows: 


(p + q)? = P + 3p'q + 3pq + g 


The values being added on the right side of the equality are 
the frequencies of the four sets of outcomes p and q. 


Application of Binomial Probability to Progeny 
Phenotypes Binomial probability and the binomial 
expansion can be used whenever a probability question 
addresses a repeating series of events that have two 
alternative outcomes. Let’s look at the production of 
yellow and green peas in pods with six peas each. In this 
example, the dominant allele G determines yellow color 
and the recessive allele g determines green color. The 
cross producing progeny peas is a self-fertilization of a 
yellow-seeded heterozygous (Gg) plant. The probability 
that a seed is yellow is 3, since the genotype would be 
either GG or Gg, and the probability that the seed is green, 
and therefore has the gg genotype, is +. We will use the 
variable p to represent the probability of yellow seeds and 
the variable q to represent the probability of green seeds. 

In our example of pea pods with six seeds that are 
produced by crossing heterozygous (Gg) parental plants, 
there are two possible color outcomes for each pea and 
six peas per pod, for a total of 2° or 64 combinations. 
Counting the total number of yellow and green peas in 
each pod, there are seven categories that each have a dif- 
ferent number of yellow and green peas per pod, as we 
discuss momentarily. 

The application of binomial expansion to complex 
genetic calculations requires repetition and precision in 
the use of the product rule and the sum rule. However, a 
convenient shortcut called Pascal’s triangle eliminates 
the repetitive calculations required for multiple expan- 
sions of the binomial probability equation and can be 
used for any number of expansions between 0 and the nth 
power to yield the size of each possible class and the total 
number of classes possible (Figure 2.15). Let’s return to 
our pea pod example of binomial probability to see how 
Pascal’s triangle is used. 

Figure 2.16 makes use of the values taken from the 
n = 6 line of Pascal’s triangle (highlighted in Figure 2.15). 


Total number 
of combinations 


Figure 2.15 Pascal's triangle 
of binomial coefficients (p + q) 
1 raised to the nth power. Each 
line of the table shows the dis- 


4 tribution of the total number of 
ie combinations for a given value of 
32 n (number of events). For exam- 
64 ple, for (p+q)?, use the n=2 line, 
128 which predicts a total of four out- 
1 256 come combinations distributed in 
l ala a 1:2:1 or #4: ratio. Applications 
10 1 1024 j mai A K 
11 1 2048 using the highlighted lines, n=4 
66 12 1 4096 and n=6, are discussed in the text. 
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Figure 2.16 Binomial- 
probability calculation of seed- 
color phenotype in six-seeded 
pods. Pascal’s triangle has been 
used to find the coefficients for 
the binomial equation expanded 
to n=6. The 64 different out- 
comes are displayed in seven 
classes, and the equation is used 
to compute the expected fre- 
quency of each class. 


Seed-color 
outcome class 


6 yellow 
0 green 


5 yellow 
1 green 


4 yellow 
2 green 


3 yellow 
3 green 


2 yellow 
4 green 


1 yellow 
5 green 


0 yellow 
6 green 


Number of 
combinations 
leading to 
occurrence 


15 20 15 6 1 = 64 


Probability of 
occurrence for p 
outcome class 


6p°q 15p°q? 


20p*q?  15p°g 6pq q 


Frequency of 
occurrence for 
outcome class 


(p=3,9=3) 


0.178 


0.356 


0.297 0.132 0.033 0.004 0.0002 =1.00 


These coefficients of the binomial expansion for n = 6 
give the proportions of each of the seven outcome classes 
for this example. The coefficients are 1, 6, 15, 20, 15, 6, 
and 1, and they add up to a total of 64 different combina- 
tions. The coefficients are used to multiply the binomial 
probability of each outcome class. For this case where 
p = and q = į, the expected frequency of obtaining six 
yellow peas in a pod, for example, is calculated as 1 ( p), 
or (3)6 = 0.178; for pods containing 3 yellow and 3 
green peas, the frequency is 20| (3) 3(1)3] = 0.132; the 
proportion of pods containing 2 yellow and 4 green peas 
is 15 [ ( 3) 2(t)4] = 0.033; and so on. The complete set of 
expected frequencies for different combinations of seed 
color is shown at the bottom of Figure 2.16. Notice that 
the sum of category probabilities and the sum of category 
frequencies are each 1.00. This correspondence verifies 
that all possible outcomes have been taken into account. 


2.5 Chi-Square Analysis Tests the 
Fit between Observed Values and 
Expected Outcomes 


Sections 2.1 through 2.4 contain numerous examples of 
how the principles of probability can be used to predict 
the likelihood of different outcomes of genetic crosses. 
Genetic experiments like the ones described, and like 
the ones Mendel conducted, make predictions based on 
the hypothesis that chance (ie., probability) determines 
the transmission of traits. To assess the validity of this 
hypothesis, however, geneticists must be able to compare 


the outcomes they obtain in their experiments to the 
outcomes that might be expected to occur. For example, 
are Mendel’s F, results in Table 2.1 compatible with his 
segregation hypothesis predicting a 3:1 phenotype ratio? 

Scientists must be able to make objective compari- 
sons of observed and expected results to test genetic 
hypotheses. Qualitative statements such as “the observed 
results seem to be close to the results we expected” are 
unacceptable for scientific work. Instead, a quantitative 
approach, or in this case a statistical approach, is neces- 
sary to objectively compare the results obtained from a 
cross with the results that are predicted by probability. 
Mendel did not have appropriate statistical tools avail- 
able to him. But in the early 1900s, the chi-square test was 
derived as a statistical test for comparison of observed 
experimental results with the results that may be expected 
when chance is generating the outcome. This section 
describes the chi-square test and its application to the 
analysis of genetic data, including some of Mendel’s F, 
results. We begin, however, with a brief discussion of a 
normal, or Gaussian, distribution, on which chi-square 
analysis is based. 


The Normal Distribution 


In large samples, outcomes that are predicted by chance 
have a normal (Gaussian) distribution. A normal dis- 
tribution is a binomial distribution that is often called a 
“bell-shaped curve” because of the general shape of the 
curve the data form when they are graphed (Figure 2.17). 
A normal distribution contains all the possible exper- 
imental outcomes. The mean (1) is the average outcome, 
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An idealized normal 
distribution 


Number of 
observations 


68.2% 


95.4% 
99.8% 
Experimental outcomes 


Figure 2.17 Graphing the distribution of chance outcomes 
produces a normal distribution. The standard deviation (o) is 
used to characterize the scatter of possible outcomes around 
the mean (p). 


and other outcomes are distributed around the mean. The 
tall central segment of the curve nearest the mean rep- 
resents the outcomes with the highest probability of oc- 
currence. The probability of experimental outcomes gets 
smaller toward the farthest left and right portions of the 
curve. The probability of a particular experimental out- 
come is quantified by a measurement called the standard 
deviation (o). In a normal distribution, approximately 
68.2% of all outcome values fall within one standard de- 
viation of the mean, 95.4% of outcomes fall within two 
standard deviations of the mean, and 99.8% of outcomes 
fall within three standard deviations of the mean (Figure 
2.17). The observed result of a particular experiment can 
be compared to the normal distribution to determine the 
probability of that particular experimental observation 
compared to all possible outcomes in the distribution, us- 
ing o, the standard deviation, as a guide. 

By convention, observed experimental outcomes that 
have a probability of less than 5% (<0.05)—that is, a 
probability that is more than two standard deviations 
away from the mean—are often considered to show sta- 
tistically significant difference between the observed out- 
come and the expected outcome. Chi-square analysis 
tests for statistically significant deviation in genetic ex- 
perimental results. 


Chi-Square Analysis 


The chi-square (x°) test is the most common statisti- 
cal method used in genetics experiments for comparing 
observed experimental outcomes to the results expected 
based on the probability hypothesis. Chi-square test- 
ing quantifies how closely an experimental observation 
matches the expected outcome by determining the prob- 
ability of the observed outcome. The chi-square test is 
appropriate for this task when the experimental hypoth- 
esis used to predict the outcome depends on chance, as 
Mendelian ratios do. Thus, when a chi-square test is con- 
ducted, the test is measuring how well the experimental 


observations match experimental predictions. The chi- 
square test has proven flexible and accurate in measur- 
ing the fit between observed and expected experimental 
results across a wide range of experiments. 

The chi-square value for the analysis of a given ex- 
periment is obtained in two steps. First, the difference be- 
tween the number observed and number expected in each 
outcome category is squared and divided by the number 
expected in the category; and second, the values obtained 
for each outcome class are summed. The x? formula is 


(O — E)? 


2a YY SO 
v= ys 


where O is the observed number of offspring in each 
outcome class, E is the number expected for each class, 
and the summation (x) is taken over all possible outcome 
classes. 

The size of the chi-square value for an experiment 
is dependent on the three parameters of experimental 
sample size, number of outcome classes, and the num- 
ber of observations in each outcome class, so it stands to 
reason that experiments with large numbers of outcome 
classes or more experimental observations recorded for 
each outcome class tend to have larger chi-square values 
than those found in experiments with lower numbers in 
each class. Simply stated, the addition of more or larger 
values to obtain a chi-square value leads to greater sums. 
Consequently, chi-square values are not directly compa- 
rable from one experiment to the next. Instead, each ex- 
perimental chi-square value is interpreted in terms of the 
normal distribution of expected results for an experiment 
of that size. 

The interpretation is done by means of a probability 
value (P value), which is a quantitative expression of the 
probability that the results of another experiment of the 
same size and structure will deviate as much or more from 
expected results by chance. P values in chi-square analysis 
are directly related to the probability of experimental out- 
comes in a normal distribution. High values for P (values 
close to 1) are associated with low y? values. Low chi- 
square values occur when the observed and expected re- 
sults are very similar. A high P value indicates that chance 
alone is likely to explain the deviations of experimental 
observations from expected values. Thus, an experiment 
producing a P value of 0.90 means that observed and 
expected results are close together and that 90% of all 
possible x? values are equal to or greater than the value 
obtained in the experiment. On the other hand, low P val- 
ues correspond to high chi-square values. They indicate 
substantial difference between observed and expected 
outcomes. The greater the difference between observed 
and expected results of an experiment, the greater the x? 
value and the lower the P value. 

The statistical interpretation of a chi-square value 
is obtained by identifying the P value for each experi- 
ment, and the P value is dependent on the number of 
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degrees of freedom (df) in the experiment being exam- 
ined. For each experiment, the df value is most often equal 
to the number of outcome classes (7) minus 1, or (n — 1). 
In a statistical sense, df is equal to the number of indepen- 
dent variables in an experiment. For example, suppose we 
were conducting a chi-square test of 100 coin flips. There 
are two outcome classes, heads and tails, each of which 
we expect to see 50 times. However, once we record the 
number of events in one class, say 54 heads, the number 
of events in the second class becomes dependent on that 
first number. In our coin flip example, if we flip a coin 100 
times and there are 54 heads recorded, the other 46 flips 
must be tails. Here the number of degrees of freedom is 
one because, while there are two possible outcomes, the 
value of one is always dependent on the value of the other. 
Table 2.4 is a chi-square table, containing chi-square 
values for different degrees of freedom in the body of the ta- 
ble, along the left-hand margin of the table. The correspond- 
ing P values are listed along the top margin. To determine 
the P value for the chi-square value from an experiment, the 
first step is to determine the number of degrees of freedom. 
The second step is to locate the chi-square value on the line 
corresponding to the degrees of freedom. The P value for the 
result of the experiment in question is then found at the top 
of the column containing the chi-square value. 
Interpretation of chi-square results is based on the cor- 
responding P value. A statistically significant result from 


chi-square analysis is defined as one for which the P value 
is less than 0.05. This means that there is less than a 5% 
chance (<0.05) of obtaining the experimental observation 
by chance. By convention, when any experimental result 
has less than a 5% probability, the hypothesis of chance 
is rejected. In other words, if the P value is less than 0.05, 
the difference between the observed and expected results 
is considered statistically significant, and the experimental 
hypothesis is rejected. Conversely, P values greater than 
0.05 indicate a nonsignificant deviation between observed 
and expected values. These values result in failure to reject 
the chance hypothesis. 


Chi-Square Analysis of Mendel’s Data 


Modern statistical methods allow us to do something 
Mendel could not do—test his experimental data for its 
compatibility with the predictions of the laws of seg- 
regation and independent assortment. Table 2.1 con- 
tains data from Mendel for F, segregation of the seven 
traits he tested. In the first row of the table, we see that 
Mendel scored 7324 F, seeds for round or wrinkled 
phenotypes. Among these, he counted 5474 round and 
1850 wrinkled. Based on the predictions of his segrega- 
tion hypothesis, Mendel expected that 75% of the F, 
would be round and the remaining 25% wrinkled. That 
means he expected (7324)(0.75) = 5493 round seeds 


Table 2.4 The Chi-Square Table 


Probability (P) Value 


df 0.95 0.90 0.70 0.50 0.30 
1 0.004 0.016 0.15 0.46 1.07 
2 0.10 0.21 0.71 139 2.41 
3 0.35 0.58 1.42 2.37 3.67 
4 0.71 1.06 2.20 336 A 
5 1.15 1.61 3.00 4.35 6.06 
6 1.64 2.20 3.83 535 Daze 
7 2.17 2.83 aa GE 8.38 
8 2.73 3.49 5.53 7.34 9.52 
9 3.33 4.17 639 834 10.66 

10 3.94 4.87 7 9.34 11.78 

11 4.58 5.58 8.15 10.34 12.90 

we w les0) O naa 14.01 

13 5.89 7.04 9.93 12.34 15.12 

14 657 7.79 10.82 13.34 16.22 

15 7.26 a TA 14.34 17.32 


Fail to reject chance hypothesis | 


0.20 0.10 0.05 0.01 0.001 
1.64 2.17 3.84 6.64 10.83 
322 4.61 5.99 9.21 13.82 
4.64 6.25 am 1135 16.27 
5.99 7.78 949 13.28 18.47 
7.29 9.24 11.07. 15.09 20.52 
8.56 10.65 1259 1681 22.46 

980 1202 14.07 1848 24.32 
11.03 13.36 1551 20.09 26.13 
1224 14.68 CA 2167 27.88 
13.44 15.99 Bai 2321 29.59 
14.63 17.28 1968 2473 31.26 
15.81 1355 EA 222 32.91 
16.99 19.81 me 2769 34.53 
18.15 21.06 23.69 29.14 36.12 
19.31 22.31 25.00 30.58 37.70 


| Reject chance hypothesis 


Note: Chi-square values are in the body of the table, degrees of freedom are at the far-left side, and probability values are at the top of each column of chi-square values. 
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and (7324) (0.25) = 1831 wrinkled seeds. There is 1 
degree of freedom in the experiment, and the chi-square 
is calculated as 


x? = (5474 — 5493)?/5493 + (1850 — 1831)/1831 
= 0.066 + 0.197 = 0.263 


For df = 1, the P value falls between 0.50 and 0.70 (see 
Table 2.4). This is well above the cutoff value of 0.05 and 
consequently represents a nonsignificant deviation between 
the observed outcome and the values expected for an 
experiment of this size. We fail to reject the hypothesis that 
chance is responsible for the observed outcome, and we can 
say, therefore, that Mendel’s F, data for seed shape are con- 
sistent with the predictions of the law of segregation. 

Figure 2.12 provides data Mendel collected on seed 
shape and seed color that we can use to test whether 
his results were consistent with his predictions of inde- 
pendent assortment. Based on the predicted :35:75:76 
or 9:3:3:1 ratio, the 556 Fy produced by Mendel would 
be expected to have the following distribution, where 
& = 0.5625, = 0.1875,and i: = 0.0625. 


Round, yellow (556)(0.5625) = 312.75 
Round, green (556)(0.1875) = 104.25 
Wrinkled, yellow (556)(0.1875) = 104.25 
Wrinkled, green (556)(0.0625) = 34.75 

556.00 


The chi-square value is calculated as 

(315 — 312.75 )?/312.75 + (108 — 104.25 )?/104.25 
+ (101 — 104.25 )?/104.25 + (32 — 34.75 )?/34.75 
= 0.016 + 0.135 + 0.101 + 0.218 = 0.470 


x? 


In this case, df = 3, and the P value falls between 0.90 and 
0.95. This indicates a nonsignificant deviation because the 
P value is above the 0.05 cutoff value. Mendel’s F, data for 
seed color and seed shape are therefore also consistent with 
the predictions of independent assortment. A third exam- 
ple of chi-square analysis, using trihybrid-cross results from 
one of Mendel’s experiments, is shown in Table 2.5. From 
statistical analysis of these data we conclude that Mendel’s 
results are consistent with the predictions of segregation 
and independent assortment. 


2.6 Autosomal Inheritance and 
Molecular Genetics Parallel the 
Predictions of Mendel’s Hereditary 
Principles 


During the first decade of the 20th century, immedi- 
ately after the rediscovery of Mendel’s rules of hereditary 
transmission, biologists began to extend Mendel’s find- 
ings to species other than pea plants. They also identified 


Table 2.5 Chi-Square Analysis of Mendel’s 
Trihybrid-Cross Data 
Mendel’s Observation? Number Expected 
Phenotype Number 
Round, yellow, purple 269 269.58 
Round, yellow, white 98 89.86 
Rou nd, green, purple i 86 89.86 
Round, green, white 27 29.95 
Wrinkled, yellow, purple 88 89.86 
Wrinkled, yellow, white 34 29.95 
Wrinkled, green, purple 30 29.95 
Wrinkled, green, white 7 9.98 
Total 639 638.99 


Chi-square calculation [(O — £)*/E] 
x? = (269 — 269.58 )7/269.58 + (98 — 89.86 )7/89.86 
+ (86 — 89.86) 7/89.86 + (27 — 29.95 )?/29.95 
+ (88 — 89.86) 7/89.86 + (34 — 29.95 )?/29.95 
+ (30 — 29.95)7/29.95 + (7 — 9.98) ?/9.98 

= 2.67 
df =7 
Pvalue > 0.90 


7 Data are taken from Figure 2.14. 


exceptions to Mendelian hereditary principles (Chapter 4). 
In this final section, we apply Mendelian principles to the 
transmission of certain traits in humans. In addition, we 
consider the correspondence of molecular genetics find- 
ings to Mendelian inheritance and explore the underlying 
causes of four of the traits that Mendel studied. 

Autosomal inheritance refers to the transmission of 
genes that are carried on autosomes, the chromosomes 
(22 pairs in humans) that are not sex chromosomes (X 
and Y chromosomes). Autosomal pairs of chromosomes 
are found in both males and females. Because of the two 
copies of each autosome in our genome, we, like all dip- 
loid organisms, carry two copies (alleles) of each autoso- 
mal gene. The alleles on homologous chromosomes can 
be identical, in which case a person has a homozygous 
genotype; or the alleles can be different, producing a het- 
erozygous genotype. Autosomal inheritance allows us to 
see Mendel’s law of segregation and law of independent 
assortment in action. Autosomes are distinct from the 
sex chromosomes and autosomal inheritance follows dif- 
ferent patterns than does the inheritance of genes on sex 
chromosomes (see Chapter 3). 

Pedigrees, or family trees, are a kind of symbolic 
shorthand used to trace the inheritance of traits in humans 
and in animals such as horses, dogs, cats, cattle, and oth- 
ers. In standard pedigree notation, males are represented 
by squares and females by circles (Figure 2.18). A filled 
circle or square indicates that the phenotype of interest is 
present. A line through a symbol indicates the person is 
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Symbols 
Female Male 
[_] Do not express trait 


BB Express trait 


Heterozygous carriers of a recessive allele 


[4] Deceased (d. 0000 = date of death) 


Unspecified sex 


Generation 

Parents 

Parents (closely related by blood) 
Adoption 

Siblings 


Identical twins 


Fraternal twins 


Numbers 
I, Il, Ill, etc. Roman = generations 
1,2,3,etc. Arabic = individuals in a generation 


Figure 2.18 Common pedigree symbols. 


deceased. Parents are connected to each other by a hori- 
zontal line from which a vertical line descends to their 
progeny. Individuals in a pedigree are numbered by a 
Roman numeral (I, I, III, etc.) to indicate their generation 
combined with an Arabic numeral (1, 2, 3, etc.) that identi- 
fies each organism in a generation. Identifying an individual 
by a Roman numeral followed by an Arabic numeral, as in 
1-2 or III-6, is an efficient way to ensure clarity in referring 
to particular organisms and, in the case of humans, allows 
protection of privacy by not requiring the use of names. 


Autosomal Dominant Inheritance 


The pedigree in Figure 2.19 shows characteristics com- 
monly observed for autosomal dominant inheritance of 
a disease. Notice the following six characteristics: 


1. Each individual who has the disease has at least 
one affected parent. Anyone carrying at least one 


4.1956 


copy of a dominant allele will display the dominant 
phenotype. Therefore, any disease or disorder caused 
by a dominant allele is seen in successive generations 
(this characteristic is described as a vertical pattern of 
transmission). In Figure 2.19, all 13 affected children 
in generations II, III, and IV have at least one affected 
parent. The only exceptions to this general rule are 
(1) the occurrence of a new mutation in a child and 
(2) a person with the dominant mutation entering the 
family through marriage. The pedigree shows no evi- 
dence of a new mutation, but individual I-16 marries 
into the family and has the dominant mutation. 


2. Males and females are affected in equal numbers. 
Mutations carried on an autosome are equally likely 
to occur in either sex. Among the total of 15 affected 
individuals in the figure, 7 are male and 8 are female. 


3. Either sex can transmit the disease allele. Seven 
parents in Figure 2.19 with the mutant phenotype 
have transmitted the disease to one or more children. 
Three of the transmitting parents are male and four 
are female. 


4. In crosses in which one parent is affected and 
the other is not, approximately half the offspring 
express the disease. Diseases caused by dominant 
mutations are usually rare in populations, and most 
affected individuals are heterozygous. A cross be- 
tween one affected parent and one unaffected parent 
can most often be genetically interpreted as a hetero- 
zygous-by-homozygous cross, expected to produce a 
1:1 ratio between phenotypes. In this family, there are 
six crosses between an affected person who is hetero- 
zygous and an unaffected person who is homozygous 
for the recessive allele. Among the 19 children pro- 
duced by these crosses, 9 of the children are affected 
and 10 are unaffected. The children of the cross be- 
tween III-14 and II-15 are excluded from this count 
because both parents have the dominant mutant 
phenotype. 


5. Two unaffected parents will not have any children 
with the disease. Dominant phenotypes require 
the presence of at least one copy of the dominant 
allele. If each parent has the recessive (“normal”) 
phenotype, they must each be homozygous for the 
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Figure 2.19 Autosomal dominant inheritance. 
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recessive allele, and all their offspring should also be 
homozygous. Three crosses of this kind are shown 

in the pedigree, and all seven resulting children have 
the normal phenotype. New mutation is an exception 
to this rule, but it is not seen in this family. 


6. Two affected parents may produce unaffected chil- 


dren. 


If each parent is heterozygous, the expected 


ratio between affected and unaffected children is 3:1. 
The mating between III-15 and III-16 produces four 
children of whom three are affected. The mating of 
two heterozygous affected parents presents a one-in- 
four chance of producing a child homozygous for the 
mutant allele and a one-in-four chance of producing a 
child homozygous for the recessive allele. The homo- 
zygous recessive child (IV-10) is unaffected. 


Autosomal Recessive Inheritance 


Figure 2.20 shows a human pedigree displaying the charac- 
teristics commonly observed for autosomal recessive in- 
heritance of a disease. There are six key features to notice: 


1. Individuals who have the disease are often born to 


parents who do not. 


A child with the disease (the 


recessive phenotype) must have inherited one copy 
of the recessive allele from each parent. Moreover, 

it is common for children with the disease to have 
been produced by parents with the dominant (nor- 
mal) phenotype who are heterozygous. Four affected 
family members, IV-5, IV-6, IV-10, and V-3, are the 
children of heterozygous carrier parents. That is, 
HI-2 and III-3 are heterozygous carriers, as are III-4 
and III-5 and IV-1 and IV-2. 


2. If only one parent has the disorder, the risk that a 
child has the disorder depends on the genotype of 


the other parent. 


The affected parent is homozy- 


gous recessive and must pass a copy of the recessive 
allele to each child. If the unaffected parent is het- 
erozygous, the risk that a given child will be affected 
is 5. If the unaffected parent is homozygous for 

the dominant allele, all children will be unaffected 
heterozygotes. 


IV 
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If both parents have the disorder, all children will 
have the disorder. If both parents are homozy- 
gous recessive, all their offspring will have the same 
homozygous genotype. The four affected siblings in 
the last generation of the idealized pedigree inherit 
their disorder in this way. 


The sex ratio of affected offspring is expected to 

be equal. Males and females are equally likely to be 
homozygous for the recessive allele. The sex of a child 
is independent of the likelihood that the homozygous 
recessive genotype occurs at the autosomal gene. 

In the example pedigree there are a total of eight 
affected individuals—four males and four females. 


The disease is usually not seen in each generation; 
but if an affected child is produced by unaffected 
parents, the risk to subsequent children of the cou- 
pleis}. If both parents have the dominant phenotype, 
they can produce a child with the recessive phenotype 
only if they are each heterozygous. This is usually rare in 
a population, so production of affected children is rare. 
If an affected child is born to a healthy couple, however, 
each parent is a heterozygous carrier of the recessive 
disease allele, and the disease risk to each additional 
child is j. In the example pedigree, the recessive condi- 
tion is confined to the fourth and fifth generations. 


If the disease or disorder is rare in the population, 
unaffected parents of an affected child are more 
likely to be related to one another. Individuals 
who are related to one another can carry identical 
alleles as a result of their shared ancestry. If the re- 
cessive allele is present in the family, the sharing of 
alleles through common ancestry increases the prob- 
ability that related individuals might both be carriers 
of the recessive allele in comparison to the population 
at large. In Figure 2.20, the two affected parents of 
the four affected siblings are related to one another. 
When a disease is rare, the assumption is that a per- 
son who married into the family does not carry the 
disease allele (i.e., is homozygous dominant) unless 
there is contradicting evidence from the pedigree 

(i.e., one of the offspring has the recessive phenotype). 
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Figure 2.20 Autosomal recessive inheritance. 
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Molecular Genetics of Mendel’s Traits 


The discovery of the basis of Mendel’s traits continues to 
the present day using methods of molecular genetics to 
identify the genes responsible for the phenotypic varia- 
tion Mendel studied. These molecular analyses, the first 
of which was published in 1990, describe the nucleic acid 
(DNA and RNA) variation and the polypeptide (protein 
and enzyme) variation responsible for Mendel’s traits. 
A cornerstone of modern genetics is the seamless integra- 
tion of the principles of transmission genetics with those 
of molecular genetic analysis, and the molecular genetic 
analysis of Mendel’s traits reveals that the molecular 
genetic and the transmission genetic analyses are—two 
sides of the same coin. The pattern of transmission of 
morphologic variants is traceable through examination of 
the hereditary molecules DNA, RNA, and protein. 

Identifying these genes and determining how mo- 
lecular variation in them produces morphologic variation 
in pea plants requires the demonstration that (1) allelic 
variation coincides with morphologic variation, (2) DNA 
variation in the alleles produces different protein prod- 
ucts, (3) the protein products from each allele have differ- 
ent structures that lead to different functional capabilities, 
and (4) the functional differences between the protein 
products of different alleles account for the observed 
morphological variation in pea plants. The molecular dif- 
ferences between the alleles also usually clarify why the 
alleles are dominant or recessive relative to one another. 

Mendel did not leave any neatly labeled packets of 
seeds for later researchers to analyze, so the process of 
pinpointing the exact traits he examined and the genes 
and proteins responsible for them has been complicated. 
Table 2.6 identifies the researchers and the genes respon- 
sible for four of Mendel’s seven traits. For each gene, 
the wild-type DNA, RNA, and protein sequences have 
been identified, and the specific mutations producing the 
mutant alleles have been determined. In each case, the 
mutations significantly reduce or entirely eliminate pro- 
duction or function of the wild-type polypeptide, thus each 
of the mutations is recessive. These mutations are dis- 
cussed briefly here, and in further detail in Experimental 
Insight 12.1 and in Experimental Insight 13.2. 


Seed Shape (Round and Wrinkled, Gene Sbe7) In 
1990, research published by Madan Bhattacharyya and 
colleagues described the identification and molecular 
analysis of a gene responsible for round and wrinkled 
seed shape. The Sbe1 gene produces the starch-branching 
enzyme that helps convert a linear form of starch called 
amylose into a complex branched form of starch called 
amylopectin. As a consequence of the action of fully 
functional starch-branching enzyme, round seeds have a 
much higher percentage of amylopectin and a much lower 
percentage of amylose than do wrinkled seeds, which do 
not have functional starch-branching enzyme. Amylose 
readily loses sugar molecules, and in developing wrinkled 


seeds, the high concentration of free sugar leads the seeds 
to excessive water uptake that swells the developing seeds. 
As seeds mature they naturally dehydrate. The maturing 
wrinkled seeds lose much more water than do maturing 
round seeds, resulting in a partial collapse of the wrinkled 
seed membranes that does not occur in round seeds. See 
Experimental Insight 13.2 for more details. 


Stem Length (Tall and Short, Gene Le) In 1997, two 
research groups, one led by David Martin and the other by 
Diane Lester, determined that a gene called Le produces 
the variation in stem length that Mendel saw as tall and 
short plants by controlling growth of the main stem of the 
plant. This Le gene produces giberellin 3B-hydroxylase, an 
enzyme that catalyzes one step of the multistep biochemical 
pathway synthesizing the plant growth hormone giberellin. 
Wild-type plants are able to produce giberellin and can 
grow tall, but a base substitution mutation in the mutant 
allele results in a very low level of giberellin and poor 
growth. See Experimental Insight 13.2 for more details. 


Seed Color (Yellow and Green, Gene Sgr) Two studies 
published in 2007, one by Ian Armstead and colleagues 
and the other by Sylvain Aubry and colleagues, identified 
the Sgr gene, known as “stay-green,” that produces mutant 
green seeds rather than wild-type yellow seeds in plants 
that are homozygous for a mutation of the gene. In this 
case, the polypeptide product of Sgr is an enzyme that 
catalyzes a step in the breakdown of chlorophyll, a green- 
colored compound. Chlorophyll breakdown normally 
occurs as seeds mature, and results in the yellow color 
of wild-type seeds. A mutation prevents production of a 
functional enzyme, and the absence of its activity in the 
chlorophyll-breakdown pathway results in the retention 
of green color in mutant seeds. See Experimental Insight 
12.1 for more details. 


Flower Color (Purple and White, Gene bHLH) In 2010, 
the gene responsible for the white-flower mutation in 
Mendel’s pea plants was identified. A research group 
led by Roger Hellens determined that mutation of 
the bHLH gene in pea plants produces mutant white 
flowers rather than wild-type purple flowers. The protein 
product of bHLH is a transcription factor protein that 
interacts with other proteins to activate the transcription 
of certain genes. In this case, the genes targeted for 
transcription activation are active in the pathway that 
normally produces the purple-colored plant pigment 
anthocyanin. Wild-type plants produce enough of the 
gene product (the transcription factor protein) to activate 
transcription of anthocyanin-producing genes. Plants that 
are homozygous for mutations of this gene, however, are 
unable to activate transcription of the pigment-producing 
genes. These plants lack the purple anthocyanin pigment, 
and so their flowers are white. See Experimental Insight 
12.1 for more details. 


Table 2.6 


Identification and Molecular Characterization of Four of Mendel’s Traits 
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Trait 


Seed shape 
(round and 
wrinkled seeds) 


Stem length 
(tall and short 
plants) 


Seed color 
(yellow seed 
and green seed) 


Flower color 
(purple flower 
and white 
flower) 


Gene and Gene 
Product 


The gene is Sbe7, 
producing starch- 
branching enzyme. 


The gene is Le, 
producing gibberel- 
lin 3B-hydroxylase 
(G3BH). 


The gene was 
originally named / 
gene and was later 
renamed Sgr (called 
“stay green”). The 
gene produces 

an enzyme that 
helps break down 
chlorophyll. 


Originally named 
gene A and renamed 
bHLH, the gene pro- 
duces a protein that 
activates transcrip- 
tion of target genes. 


Wild-Type Allele 
and Function 


The dominant wild-type 
allele (R) produces starch- 
branching enzyme that 
converts amylase, a linear 
starch, into amylopectin, a 
complex branched starch. 


G3BH produced by the 
dominant allele Le converts 

a precursor in the synthesis 
of the plant growth hormone 
gibberellin that causes plants 
to grow tall. 


The dominant wild-type 
allele (/) produces an enzyme 
that catalyzes one step in 

the chlorophyll breakdown 
pathway, which turns wild- 
type seeds yellow as they 
mature. 


The dominant wild-type 
allele (A) produces a protein 
that activates transcription of 
genes required to synthesize 
the purple-colored plant pig- 
ment called anthocyanin. 


Mutant Allele 
and Function 


The recessive mutant allele 
(r) contains an inserted seg- 
ment about 800 base pairs 
in length. The transcript of 
the mutant allele does not 
produce an enzyme prod- 
uct, resulting in a loss of 
function. 


The recessive mutant le 
allele contains a base sub- 
stitution that results in an 
amino acid change. The 
mutant G3ßH has less than 
5% the activity of the wild- 
type product and produces 
little gibberellin, leading to 
short plants. 


The recessive mutant 

allele (i) contains two base 
substitutions and a base 
pair insertion. The resulting 
mutant polypeptide has 

no function, leading to a 
blockage of the chlorophyll 
breakdown pathway and 
causing mutant seeds to 
retain their immature green 
color. 


The recessive mutant 
allele (a) contains a base 
substitution that results in 
production of abnormal 
mRNA. The mutant mRNA 
does not produce the 
transcription-activating 
protein, thus blocking an- 
thocyanin production and 
resulting in the develop- 
ment of white flowers. 
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A common feature of each of the genes controlling 
Mendel’s traits is that the wild-type alleles are dominant 
to mutant alleles that are recessive. This is a consequence 
of the loss of function on the part of the mutant alleles. 
For each gene, one or two copies of the wild-type allele 
results in the wild-type phenotype, whereas the mutant 
phenotype is produced in plants that are homozygous for 
the mutant allele. We discuss the relationship between al- 
leles and explore other kinds of dominance relationships 
in Section 4.1. 

In broader terms, the conclusions from molecu- 
lar studies identifying genes Mendel examined in his 


crosses are that (1) the inheritance of allelic variants pre- 
cisely parallels the pattern of transmission of morpho- 
logical variation and (2) morphological variation in pea 
plants results from differences in the structure and func- 
tion of the proteins produced by the alleles. Molecular 
genetic analysis has led to (3) identification of the DNA- 
sequence differences between alleles, determination of 
the impact of those differences on mRNA, and descrip- 
tion of the alteration of protein structures resulting from 
each mRNA; and (4) functional analysis of the protein 
product of each allele to describe the role it plays in pro- 
ducing the phenotype. 
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CASE STUDY 


Inheritance of Sickle Cell Disease in Humans 


The Online Mendelian Index of Man (OMIM) is a continuously 
updated public information catalog providing up-to-date 
information on more than 18,000 human hereditary traits. 
OMIM can be accessed at im. 
Each trait listed in the OMIM catalog has a unique identi- 
fier number. One trait, named sickle cell disease (SCD), 
OMIM number 603903, is the subject of a later discussion 
(see Chapter 10) that introduces several important research 
techniques and uses them to describe the discovery and 
analysis of the molecular basis of SCD and the evolution of 
the mutant allele. Here we examine the hereditary transmis- 
sion of SCD, which is caused by a base substitution muta- 
tion in the B-globin gene. The base substitution alters the 
B-globin protein and results in the inheritance of SCD as 
an autosomal recessive condition. The inheritance of the 
B$ variant and SCD can be traced by identifying the pheno- 
types of family members and displaying them in a pedigree, 
or family tree. 


PEDIGREE ANALYSIS The pedigrees shown in Figure 2.21 
identify females with circles and males with squares, and are 
typical of a family in which SCD is inherited. Blue circles and 
squares indicate family members who do not have the trait be- 
ing traced; a pink circle or square indicates a person with the 
trait (in this case, SCD). In Figure 2.21a, the father and mother 
are identified as l-1 and l-2. Their daughter Il-4 is affected by 
SCD, as indicated by a pink circle. Her siblings, individuals Il-1, 
Il-2, and Il-3, are healthy. 

The pedigree in Figure 2.21a identifies the genotype 
for the B-globin gene in each member of a certain fam- 
ily. Each person carries two copies of the gene. Note that 
person Il-1 is homozygous for ß^, the wild-type allele. 
Alternatively, siblings Il-2 and Il-3 and the parents in the 
pedigree, l-1 and l-2, are heterozygous and carry alleles B4 
and the mutant allele B°. 

The child Il-4 is homozygous for B° and has SCD. This 
disorder is a recessive trait because the phenotype is dis- 
played only in a person who is homozygous for the allele that 
produces it. In contrast, the dominant, wild-type phenotype is 
produced by the presence of either one or two copies of B^. 
In this family, each parent has the dominant, wild-type phe- 
notype, but the appearance of a child with the recessive trait 
means that each parent must be a heterozygous carrier of a 
recessive allele. 


PUNNETT SQUARE ANALYSIS Figure 2.21b illustrates the 
idealized transmission of alleles from heterozygous parents to 
offspring in generation II using a Punnett square. Each of the 
two alleles carried by a heterozygote has a chance of being 
transmitted to an offspring. Chance dictates that four different 
combinations of alleles can be transmitted from these parents 


The offspring of 
two heterozygous 


carrier parents are 
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3 
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ll ppt Ap: ie ÉH) dominant and } 


recessive. 
} recessive 


L} dominant — 


(b) Each parental allele has a 4 chance 
of being passed to a child. 
R Each combination of alleles 
2B" 36° in offspring genotypes has a 
probability of (3)(3) =4. 
5B") 1B"B" | 3B°B° ed 
Father 
1gs } of the progeny are 
expected to have SCD. 
Punnett square 
The children are expected to have 
three genotypes in proportions 
4 B“B" : 5 B°P° : 4 BBY. 
Figure 2.21 Hereditary transmission of sickle cell disease. 


(a) Each parent passes one allele to each child. (b) Three 
genotypes are expected to occur among the children in the 
proportions shown. 


to their children. The arrows in the figure indicate the parental 
origin of alleles in the homozygous and heterozygous children 
of this couple. Notice that three of the four children have the 
dominant phenotype, being either homozygous for the domi- 
nant allele ( 848°) or heterozygous ( 848°), and that one of the 
four children has the homozygous °° genotype and there- 
fore suffers from SCD. 

The ratio of å dominant to 4} recessive is the 3:1 ratio of 
phenotypes that, as we saw repeatedly in this chapter, is the 
expected statistical outcome of crosses between two hetero- 
zygous organisms. Each allele transmitted from a heterozy- 
gote has a 5 chance of being passed to a child. Any one of the 
four combinations of alleles transmitted to a child is expected 
to occur with a frequency of (3)(3) = 3; thus, the frequency 
of children with SCD produced by heterozygous carrier par- 
ents is 1. The three genotypes in the children are expected to 
occur in the ratio + B4g4: 1 B4B°: 4 B°B°. These genotypes can 
be distinctly identified using DNA- and protein-based analy- 
sis. (We describe these molecular techniques and explore 
other details of SCD in Chapter 10.) 


SUMMARY ( MasteringGenetics™ 


2.1 Gregor Mendel Discovered the Basic 
Principles of Genetic Transmission 


! A broad education in science and mathematics prepared 
Gregor Mendel to design hybridization experiments that 
could reveal the principles of hereditary transmission. 


2.2 Monohybrid Crosses Reveal the Segregation 
of Alleles 


| Mendel’s experimental design had five important features: 
controlled crosses, use of pure-breeding parental strains, ex- 
amination of discreet traits, quantification of results, and the 
use of replicate and reciprocal crosses. 
Crosses between pure-breeding parental plants with differ- 
ent phenotypes produce monohybrid F; progeny with the 
dominant phenotype. 

f Monohybrid crosses produce a 3:1 ratio of the dominant to 
the recessive phenotype among F, progeny and demonstrate 
the operation of the law of segregation. 


| The law of segregation states that two alleles at a gene will 
separate from one another during gamete formation, each 
allele has an equal probability of inclusion in a gamete, and 
gametes unite at random during reproduction. 

f Mendel used test-cross analysis to demonstrate that F; 

plants are monohybrid, and he used the self-fertilization of 

F, plants with the dominant phenotype to demonstrate that 

the latter have a 2:1 ratio of heterozygotes to homozygotes. 


2.3 Dihybrid and Trihybrid Crosses Reveal the 
Independent Assortment of Alleles 


f The F, progeny of dihybrid F, plants display a 9:3:3:1 phe- 
notype ratio that demonstrates the operation of the law of 
independent assortment. 

f Mendel used trihybrid-cross analysis to demonstrate that 
alleles of multiple genes are transmitted in accordance with 
the predictions of the law of independent assortment. 


2.4 Probability Theory Predicts Mendelian Ratios 


f The product rule of probability is used to determine the 
likelihood of two or more independent events occurring 
simultaneously or consecutively. The joint probability is 
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determined by multiplying the probabilities of the indepen- 
dent events. 


The sum rule of probability is applied when two or more 
outcomes are possible. The individual probabilities of 
the outcomes are added together to determine the joint 
probability. 

| Conditional probability is the probability of outcomes that 
are contingent on particular conditions. 

| Binomial probability theory describes the distribution of 
outcomes of an experiment in terms of the number of out- 
come classes and the frequency of each class. Pascal’s tri- 
angle is a convenient tool for determining the distribution of 
binomial outcomes. 


2.5 Chi-Square Analysis Tests the Fit between 
Observed Values and Expected Outcomes 


I The chi-square test (y?) compares observed results with the 
results predicted by a genetic hypothesis that is based on 
chance. 

E The result of the chi-square test determines how closely pre- 
dictions match results. 

E The significance of a chi-square value is determined by the P 
(probability) value corresponding to the number of degrees 
of freedom in the experiment. 


2.6 Autosomal Inheritance and Molecular 
Genetics Parallel the Predictions of Mendel’s 
Hereditary Principles 


E Traits transmitted by autosomal inheritance are equally 


likely in males and females. 

E Autosomal dominant inheritance produces a vertical pattern 
of transmission in which each organism with the dominant 
trait has at least one parent with the trait. 

E Traits transmitted in an autosomal recessive pattern are 
usually distributed in a horizontal pattern in which off- 
spring with the recessive trait frequently descend from 
parents that are heterozygous and have the dominant 
phenotype. 

Molecular analysis of four of Mendel’s traits illustrates how 
transmission genetic analysis and molecular genetic analysis 
characterize the same hereditary processes at different levels. 
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dominant phenotype (p. 31) 
Fy, Fo, F3 generation (p. 30) 
forked-line diagram (p. 38) 
gamete (p. 33) 
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genotypic ratio (p. 33) 
heterozygous genotype (heterozygote) 
(p. 33) 
homozygous genotype (homozygote) 
(p. 33) 
law of independent assortment (Mendel’s 
second law) (p. 38) 
law of segregation (Mendel’s first law) 
(p. 34) 
mean (p) (p. 48) 
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(p. 30) 


pedigree (p. 51) 


monohybrid cross (p. 33) 
normal (Gaussian) distribution (p. 48) 
parental generation (P generation) 


particulate inheritance (p. 33) 
Pascal’s triangle (p. 47) 


phenotypic ratio (p. 33) 
product rule (multiplication rule) (p. 44) 
Punnett square (p. 33) 


pure-breeding (true-breeding) (p. 30) 
P value (probability value) (p. 49) 
recessive phenotype (p. 31) 
reciprocal cross (p. 31) 

replicate cross (p. 30) 

standard deviation (c) (p. 49) 

sum rule (addition rule) (p. 45) 

test cross (test-cross analysis) (p. 31) 
transmission genetics (p. 26) 
trihybrid cross (p. 41) 


Chapter Concepts 


1. 


"Tor T Yolu 


Compare and contrast the following terms: 


dominant and recessive 

genotype and phenotype 
homozygous and heterozygous 
monohybrid cross and test cross 
dihybrid cross and trihybrid cross 


E E a 


For the cross BB X Bb, what is the expected genotype ra- 
tio? What is the expected phenotype ratio? 


For the cross Aabb X aaBb, what is the expected genotype 
ratio? What is the expected phenotype ratio? 


In mice, black coat color is dominant to white coat 
color. In the pedigree below, mice with a black coat 
are represented by darkened symbols, and those with 
white coats are shown as open symbols. Using allele 
symbols B and b, determine the genotypes for each 
mouse. 


on @ Oo 
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Two parents plan to have three children. What is the prob- 
ability that the children will be two girls and one boy? 


Consider the cross AaBbCC X AABbCc. 

a. How many different gamete genotypes can each organ- 
ism produce? 

b. Use a Punnett square to predict the expected ratio of 
offspring phenotypes. 

c. Use the forked-line method to predict the expected ra- 
tio of offspring phenotypes. 


If a chi-square test produces a chi-square value of 7.83 with 

4 degrees of freedom, 

a. in what interval range does the P value fall? 

b. is the result sufficient to reject the chance hypothesis? 

c. above what chi-square value would you reject the 
chance hypothesis for an experiment with 7 degrees of 
freedom? 


( MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 


For answers to selected even-numbered problems, see Appendix: Answers. 


8. 


BOOouHHCO 


Determine whether the statements below are true or false. 
If a statement is false, provide the correct information or 
revise the statement to make it correct. 

a. Ifa dihybrid cross is performed, the expected genotypic 
ratio is 9:3:3:1. 

b. A student uses the product rule to predict that the 
probability of flipping a coin twice and getting a head 
and then a tail is }. 

c. A test cross between a heterozygous parent and a 
homozygous recessive parent is expected to produce a 
1:1 genotypic and phenotypic ratio. 

d. The outcome of a trihybrid cross is predicted by the law 
of segregation. 

e. Reciprocal crosses that produce identical results 
demonstrate that a strain is pure-breeding. 

f. Ifa woman is heterozygous for albinism, an autosomal 
recessive condition that results in the absence of skin 
pigment, the proportion of her gametes carrying the 
allele that allows pigment expression is expected to 
be 75%. 

g. The progeny of a trihybrid cross are expected to have 

one of 27 different genotypes. 

If a dihybrid F; plant is self-fertilized, 


7 


(1) 5 of the progeny will have the same phenotype as 
the F, parent. 


(2) 4 of the progeny will be true-breeding. 


(3) 5 of the progeny will be heterozygous at one or 
both loci. 


In the datura plant, purple flower color is controlled 

by a dominant allele P. White flowers are found in 

plants homozygous for the recessive allele p. 

Suppose that a purple-flowered datura plant with an 

unknown genotype is self-fertilized and that its prog- 

eny are 28 purple-flowered plants and 10 white-flowered 

plants. 

a. Use the results of the self-fertilization to determine the 
genotype of the original purple-flowered plant. 

b. If one of the purple-flowered progeny plants is selected 
at random and self-fertilized, what is the probability it 
will breed true? 


10. 


11. 


12. 


13. 


14. 


The dorsal pigment pattern of frogs can be either “leop- 
ard” (white pigment between dark spots) or “mottled” 
(pigment between spots appears mottled). The trait is 
controlled by an autosomal gene. Males and females are 
selected from pure-breeding populations, and a pair of 
reciprocal crosses is performed. The cross results are 
shown below. 


Cross 1: P: Male leopard X female mottled 
F,: All mottled 
Fy: 70 mottled, 22 leopard 

Cross 2: P: Male mottled X female leopard 


F,: All mottled 
F>: 50 mottled, 18 leopard 


a. Which of the phenotypes is dominant? Explain your 
answer. 

b. Compare and contrast the results of the reciprocal 
crosses in the context of autosomal gene inheritance. 

c. Inthe Fy progeny from both crosses, what proportion 
is expected to be homozygous? What proportion is 
expected to be heterozygous? 

d. Propose two different genetic crosses that would allow 
you to determine the genotype of one mottled frog from 
the F, generation. 


Black skin color is dominant to pink skin color in pigs. Two 

heterozygous black pigs are crossed. 

a. What is the probability that their offspring will have 
pink skin? 

b. What is the probability that the first and second off- 
spring will have black skin? 

c. Ifthese pigs produce a total of three piglets, what 
is the probability that two will be pink and one will be 
black? 


A male mouse with brown fur color is mated to two differ- 

ent female mice with black fur. Black female 1 produces a 

litter of 9 black and 7 brown pups. Black female 2 produces 

14 black pups. 

a. What is the mode of inheritance of black and brown fur 
color in mice? 

b. Choose symbols for each allele, and identify the 
genotypes of the brown male and the two black 
females. 


Figure 2.13 shows the results of Mendel’s test-cross 
analysis of independent assortment. In this experiment, 
he first crossed pure-breeding round, yellow plants to 
pure-breeding wrinkled, green plants. The round 
yellow F} are crossed to pure-breeding wrinkled, 

green plants. Use chi-square analysis to show that 
Mendel’s results do not differ significantly from 

those expected. 


An experienced goldfish breeder receives two unusual 
male goldfish. One is black rather than gold, and the 
other has a single tail fin rather than a split tail fin. The 
breeder crosses the black male to a female that is gold. 
All the F; are gold. She also crosses the single-finned 
male to a female with a split tail fin. All the F; have a 
split tail fin. She then crosses the black male to F; gold 
females and, separately, crosses the single-finned male 


15. 


16. 


17. 


18. 
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to F; split-finned females. The results of the crosses are 
shown below. 


Black male X F; gold female: 


Gold 32 
Black 34 


Single-finned male X Fj split-finned female: 
Split fin 41 
Single fin 39 


a. What do the results of these crosses suggest about the 
inheritance of color and tail fin shape in goldfish? 

b. Is black color dominant or recessive? Explain. Is single 
tail dominant or recessive? Explain. 

c. Use chi-square analysis to test your hereditary hypoth- 
esis for each trait. 


The pedigree below shows the transmission of albinism 
(absence of skin pigment) in a human family. 


O 2 eS: 
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a. What is the most likely mode of transmission of albi- 
nism in this family? 

b. Using allelic symbols of your choice, identify the geno- 
types of the male and his two mates in generation I. 

c. The female I-1 and her mate, male I-2, had four 
children, one of whom has albinism. What is the 
probability that they could have had a total of four 
children with any other outcome except one child with 
albinism and three with normal pigmentation? 

d. What is the probability that female I-3 is a heterozygous 
carrier of the allele for albinism? 

e. One child of female I-3 has albinism. What is the prob- 
ability that any of the other four children are carriers of 
the allele for albinism? 


A geneticist crosses a pure-breeding strain of peas produc- 
ing yellow, wrinkled seeds with one that is pure-breeding 
for green, round seeds. 
a. Use a Punnett square to predict the F progeny that 
would be expected if the F} are allowed to self-fertilize. 
b. What proportion of the F, progeny are expected to have 
yellow seeds? Wrinkled seeds? Green seeds? Round seeds? 
c. What is the expected phenotype distribution among the 
Fy progeny? 
Suppose an F; plant from Problem 16 is crossed to the 
pure-breeding green, round parental strain. Use a forked- 
line diagram to predict the phenotypic distribution of the 
resulting progeny. 


In pea plants, the appearance of flowers along the main 
stem is a dominant phenotype called “axial” and is con- 
trolled by an allele T. The recessive phenotype, produced 
by an allele ¢, has flowers only at the end of the stem and 
is called “terminal.” Pod form displays a dominant pheno- 
type “inflated,” controlled by an allele C, and a recessive 
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“constricted” form, produced by the c allele. A cross 
is made between a pure-breeding axial, constricted plant 
and a plant that is pure-breeding terminal, inflated. 


a. The F; progeny of this cross are allowed to self-fertilize. 
What is the expected phenotypic distribution among 
the F, progeny? 

b. Suppose that all of the F, progeny with terminal flow- 
ers, i.e., plants with terminal flowers and inflated pods 
and plants with terminal flowers and constricted pods, 
are saved and allowed to self-fertilize to produce a 
partial F; generation. What is the expected phenotypic 
distribution among these F3 plants? 


Application and Integration 


20. 


21. 


22. 


23. 


24. 


Experimental Insight 2.1 describes data on the kernel color 
distribution of bicolor corn, collected by a genetics class 
like yours. To test the hypothesis that the kernel color of 
bicolor corn is the result of the segregation of two alleles at 
a single genetic locus, the class counted 12,356 kernels and 
found that 9304 were yellow and 3052 were white. Use chi- 
square analysis to evaluate the fit between the segregation 
hypothesis and the class results. 


The pedigree below shows the transmission of a pheno- 


typic character. 
| OuNP 
iI e E 2 ~ mi 


Using B to represent a dominant allele and b to represent a 
recessive allele, 


a. give the genotype(s) possible for each member of the 
family, assuming the trait is autosomal dominant. 

b. give the genotype(s) possible for each member of the 
family, assuming the trait is autosomal recessive. 


The seeds in bush bean pods are each the product of an in- 
dependent fertilization event. Green seed color is dominant 
to white seed color in bush beans. If a heterozygous plant 
with green seeds self-fertilizes, what is the probability that 6 
seeds in a single pod of the progeny plant will consist of 

a. 3 green and 3 white seeds? 

b. all green seeds? 

c. at least 1 white seed? 


List all the different gametes that are possible from the fol- 
lowing genotypes. 

a. AABbCcDd 

b. AabbCcDD 

c. AaBbCcDd 

d. AabbCCdd 


Organisms with the genotypes AABbCcDd and AaBbCcDd 
are crossed. What are the expected proportions of the 
following progeny? 

a. A-B-C-D- 

b. AabbCcDd 

c. a phenotype identical to either parent 

d. A-B-ccdd 


c. Ifan F; plant from the initial cross described above is 
crossed to a plant that is terminal, constricted, what is 
the expected distribution among the resulting progeny? 

d. Ifthe plants with terminal flowers produced by the cross 
in part (c) are saved and allowed to self-fertilize, what is 
the expected phenotypic distribution among the progeny? 


19. If two six-sided dice are rolled, what is the probability that 
the total number of spots showing is 
a. 4? 
b. 7? 
c. greater than 5? 
d. an odd number? 


For answers to selected even-numbered problems, see Appendix: Answers. 


25. Inhumans, the ability to bend the thumb back beyond 
vertical is called hitchhiker’s thumb and is recessive to the 
inability to do so (OMIM 274200). Also, the presence of at- 
tached earlobes is recessive to unattached earlobes (OMIM 
128900). In the pedigree shown, the left half of the circle or 
square is filled if the person has the dominant non-hitch- 
hiker’s thumb and empty if hitchhiker’s thumb is present. 
The right half of the symbol is filled if the person has unat- 
tached earlobes and is empty if earlobes are attached. Use 
allelic symbols H and h for the thumb and E and e for ear- 
lobes, and identify the genotypes for each family member. 


oE 
ONOM 

In the fruit fly Drosophila, a rudimentary wing called 
“vestigial” and dark body color called “ebony” are inher- 
ited at independently assorting genes and are recessive 
to their dominant wild-type counterparts, full wing and 
gray body color. Dihybrid wild-type males and females 
are crossed, and 3200 progeny are produced. How many 


progeny flies are expected to be found in each pheno- 
typic class? 


26. 


27. In pea plants, plant height, seed shape, and seed color are 
governed by three independently assorting genes. The 
three genes have dominant and recessive alleles, with tall 
(T) dominant to short (t), round (R) dominant to wrinkled 


(r), and yellow (G) dominant to green (g). 


a. Ifa true-breeding tall, wrinkled, yellow plant is crossed 
to a true-breeding short, round, green plant, what phe- 
notypic ratios are expected in the F, and F3? 

b. What proportion of the F, are expected to be tall, wrin- 
kled, yellow? ttRRGg? 

c. What proportion of the F, that produce round, green 
seeds (regardless of the height of the plant) are expected 
to breed true? 


28. A variety of pea plant called Blue Persian produces a tall 
plant with blue seeds. A second variety of pea plant called 
Spanish Dwarf produces a short plant with white seed. 
The two varieties are crossed, and the resulting seeds are 
collected. All of the seeds are white; and when planted, 


they produce all tall plants. These tall F} plants are allowed 


29. 


30. 


31. 


to self-fertilize. The results for seed color and plant stature 
in the F, generation are as follows: 


F, Plant Phenotype Number 
Blue seed, tall plant 97 
White seed, tall plant 270 
Blue seed, short plant 33 
White seed, short plant 100 
“TOTAL E 500 


a. Which phenotypes are dominant, and which are 
recessive? Why? 

b. What is the expected distribution of phenotypes in the 
F, generation? 

c. State the hypothesis being tested in this experiment. 

d. Examine the data in the table by the chi-square test, and 
determine whether they conform to expectations of the 
hypothesis. 


In tomato plants, the production of red fruit color is under 
the control of an allele R. Yellow tomatoes are rr. The 
dominant phenotype for fruit shape is under the control of 
an allele T, which produces two lobes. Multilobed fruit, the 
recessive phenotype, have the genotype tt. Two different 
crosses are made between parental plants of unknown gen- 
otype and phenotype. Use the progeny phenotype ratios to 
determine the genotypes and phenotypes of each parent. 


Cross 1 progeny: two-lobed, red 
two-lobed, yellow 
multilobed, red 
multilobed, yellow 
Cross 2 progeny: two-lobed, red 
two-lobed, yellow 


multilobed, red 


Ble el BIR e oœ ol œw ou 


multilobed, yellow 


A male and a female are each heterozygous for both cystic 

fibrosis (CF) and phenylketonuria (PKU). Both conditions 

are autosomal recessive, and they assort independently. 

a. What proportion of the children of this couple will have 
neither condition? 

b. What proportion of the children will have either PKU 
or CF but not both? 

c. What proportion of the children will be carriers of one 
or both conditions? 


In a sample of 640 families with 6 children each, the distri- 
bution of boys and girls is as shown in the following table: 


Number of families 9 63 147 204 151 56 10 


Number of girls 0 1 2 3 4 5 6 


Number of boys 6 5 4 3 2 1 0 


a. Are the numbers of boys to girls in these families con- 
sistent with the expected 1:1 ratio? Support your answer 
by chi-square analysis. 

b. Is the distribution of the numbers of boys and girls in 
the families consistent with the expectations of bino- 
mial probability? Support your answer. 


32. 
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A sample of 120 families with 4 children each in which 
both parents are carriers of an autosomal recessive 
mutation for cystic fibrosis (CF) produces the following 
distribution of children with and without cystic 
fibrosis: 


Number of families 16 52 m32 18 
Children with CF 0 1 2 4 
Children free of CF 4 3 2 1 


a. Is the total number of children with CF in these 
families consistent with the expected ratio? Support 
your answer. 

b. What is the expected distribution of the number of 
families with 0 through 4 children with CF in this 
sample under the assumptions of binomial 
probability? 

c. Is the distribution of families with 0 through 4 children 
with CF consistent with the ratios expected under bino- 
mial probability? Support your answer. 


A woman expressing a dominant phenotype is 

heterozygous (Dd) at the gene. 

a. What is the probability that the dominant allele carried 
by the woman will be inherited by a grandchild? 

b. What is the probability that two grandchildren of the 
woman who are first cousins to one another will each 
inherit the dominant allele? 

c. Drawa pedigree that illustrates the transmission of the 
dominant trait from the grandmother to two of her 
grandchildren who are first cousins. 


Two parents who are each known to be carriers of an 
autosomal recessive allele have four children. None of the 
children has the recessive condition. What is the prob- 
ability that one or more of the children is a carrier of the 
recessive allele? 


An organism having the genotype AaBbCcDadEe is self-fer- 

tilized. Assuming the loci assort independently, determine 

the following proportions: 

a. gametes that are expected to carry only dominant 
alleles 

b. progeny that are expected to have a genotype identical 
to that of the parent 

c. progeny that are expected to have a phenotype identical 
to that of the parent 

d. gametes that are expected to be ABcde 

e. progeny that are expected to have the genotype 
AabbCcDdE- 


A man and a woman are each heterozygous carriers of an 
autosomal recessive mutation of a disorder that is fatal in 
infancy. They both want to have multiple children, but they 
are concerned about the risk of the disorder appearing in 
one or more of their children. In separate calculations, de- 
termine the probabilities of the couple having five children 
with 0, 1, 2, 3, 4, and all 5 children being affected by the 
disorder. 


For a single dice roll, there is a 4 chance that any particu- 
lar number will appear. For a pair of dice, each specific 
combination of numbers has a probability of 4 of occur- 
ring. Most total values of two dice can occur more than 
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one way. As a test of random probability theory, a student 
decides to roll a pair of six-sided dice 300 times and tabu- 
late the results. She tabulates the number of times each 
different total value of the two dice occurs. Her results are 
the following: 


Total Value of Two Dice Number of Times Rolled 


N 


7 
w 
23 
36 
42 
53 
40 
38 
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TOTAL 300 

The student tells you that her results fail to prove that 
random chance is the explanation for the outcome of this ex- 
periment. Is she correct or incorrect? Support your answer. 


You have four guinea pigs for a genetic study. One male 
and one female are from a strain that is pure-breeding 

for short brown fur. A second male and female are from 

a strain that is pure-breeding for long white fur. You are 
asked to perform two different experiments to test the 
proposal that short fur is dominant to long fur and that 
brown is dominant to white. You may use any of the four 
original pure-breeding guinea pigs or any of their offspring 
in experimental matings. Design two different experiments 
(crossing different animals and using different combina- 
tions of phenotypes) to test the dominance relationships 
of alleles for fur length and color, and make predictions for 
each cross based on the proposed relationships. Anticipate 
that the litter size will be 12 for each mating and that fe- 
male guinea pigs can produce three litters in their lifetime. 


Galactosemia is an autosomal recessive disorder caused by 
the inability to metabolize galactose, a component of the 
lactose found in mammalian milk. Galactosemia can be 
partially managed by eliminating dietary intake of lactose 
and galactose. Amanda is healthy, as are her parents, but 
her brother Alonzo has galactosemia. Brice has a similar 
family history. He and his parents are healthy, but his sister 
Brianna has galactosemia. Amanda and Brice are planning 
a family and seek genetic counseling. Based on the infor- 
mation provided, complete the following activities and 
answer the questions. 

a. Draw a pedigree that includes Amanda, Brice, their sib- 
lings, and parents. Identify the genotype of each person, 
using G and g to represent the dominant and recessive 
alleles, respectively. 

b. What is the probability that Amanda is a carrier of 
the allele for galactosemia? What is the probability 
that Brice is a carrier? Explain your reasoning for each 
answer. 


c. What is the probability that the first child of Amanda 
and Brice will have galactosemia? Show your work. 

d. Ifthe first child has galactosemia, what is the probabil- 
ity that the second child will have galactosemia? Explain 
the reasoning for your answer. 


40. Sweet yellow tomatoes with a pear shape bring a high 
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price per basket to growers. Pear shape, yellow color, and 
terminal flower position are recessive traits produced by 
alleles f, r, and t, respectively. The dominant phenotypes 
for each trait—full shape, red color, and axial flower posi- 
tion—are the product of dominant alleles F, R, and T. A 
farmer has two pure-breeding tomato lines. One is full, 
yellow, terminal and the other is pear, red, axial. Design 

a breeding experiment that will produce a line of tomato 
that is pure-breeding for pear shape, yellow color, and 
axial flower position. 


A cross between a spicy variety of Capsicum annum pep- 
per and a sweet (nonspicy) variety produces F; progeny 
plants that all have spicy peppers. The F4 are crossed, and 
among the F plants are 56 that produce spicy peppers 

and 20 that produce sweet peppers. Dr. Ara B. Dopsis, an 

expert on pepper plants, discovers a gene designated Pun1 

that he believes is responsible for spicy versus sweet flavor 

of peppers. Dr. Dopsis proposes that a dominant allele P 

produces spicy peppers and that a recessive mutant allele p 

results in sweet peppers. 

a. Are the data on the parental cross and the F; and F 
consistent with the proposal made by Dr. Dopsis? 
Explain why or why not, using P and p to indicate prob- 
able genotypes of pepper plants. 

b. Assuming the proposal is correct, what proportion of 
the spicy F, pepper plants do you expect will be pure- 
breeding? Explain your answer. 


Alkaptonuria is an infrequent autosomal recessive con- 
dition. It is first noticed in newborns when the urine 

in their diapers turns black upon exposure to air. The 
condition is caused by the defective transport of the 
amino acid phenylalanine through the intestinal walls 
during digestion. About 4 people per 1000 are carriers of 
alkaptonuria. 

Sara and James had never heard of alkaptonuria and 
were shocked to discover that their first child had the 
condition. Sara’s sister Mary and her husband Frank are 
planning to have a family and are concerned about the pos- 
sibility of alkaptonuria in one of their children. 

The four adults (Sara, James, Mary, and Frank) seek 
information from a neighbor who is a retired physician. 
After discussing their family histories, the neighbor says, 

“I never took genetics, but I know from my many years in 

practice that Sara and James are both carriers of this reces- 

sive condition. Since their first child had the condition, 

there is a very low chance that the next child will also have 

it, because the odds of having two children with a recessive 

condition are very low. Mary and Frank have no chance 

of having a child with alkaptonuria because Frank has no 

family history of the condition.” The two couples each have 

babies and both babies have alkaptonuria. 

a. What are the genotypes of the four adults? 

b. What was incorrect about the information given to Sara 
and James? What is incorrect about the information 
given to Mary and Frank? 
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c. What is the probability that the second child of Mary 
and Frank will have alkaptonuria? 

d. What is the chance that the third child of Sara and 
James will be free of the condition? 

e. The couples are worried that one of their grandchildren 
will inherit alkaptonuria. How would you assess the risk 
that one of the offspring of a child with alkaptonuria 
will inherit the condition? 


Humans vary in many ways from one another. Among many 
minor phenotypic differences are the following five indepen- 
dently assorting traits that have a dominant and a recessive 
phenotype: (1) forearm hair (alleles F and f )—the presence 
of hair on the forearm is dominant to the absence of hair on 
the forearm; (2) earlobe form (alleles E and e)—unattached 
earlobes are dominant to attached earlobes; (3) widow’s peak 
(alleles W and w)—a distinct “V” shape to the hairline at the 
top of the forehead is dominant to a straight hairline; (4) 
hitchhiker’s thumb (alleles H and )—the ability to bend the 
thumb back beyond vertical is dominant and the inability to 
do so is recessive; and (5) freckling (alleles D and d)—the ap- 
pearance of freckles is dominant to the absence of freckles. 
If a couple with the genotypes Ef Ee Ww Hh Dd and Ef 
Ee Ww Hh Dd have children, what is the chance the chil- 
dren will inherit the following characteristics? 
a. the same phenotype as the parents 
b. four dominant traits and one recessive trait 
c. all recessive traits 
d. the genotype Ef EE Ww hh dd 


In chickens, the presence of feathers on the legs is due to a 
dominant allele (F), and the absence of leg feathers is due 
to a recessive allele (f). The comb on the top of the head 
can be either pea-shaped, a phenotype that is controlled 
by a dominant allele (P), or a single comb controlled by a 
recessive allele (p). The two genes assort independently. 
Assume that a pure-breeding rooster that has feathered 
legs and a single comb is crossed with a pure-breeding 
hen that has no leg feathers and a pea-shaped comb. The 
F; are crossed to produce the F. Among the resulting F, 
however, only birds with a single comb and feathered legs 
are allowed to mate. These chickens mate at random to 
produce F3 progeny. What are the expected genotypic and 
phenotypic ratios among the resulting Fz progeny? 


A pure-breeding fruit fly with the recessive mutation cut 
wing, caused by the homozygous cc genotype, is crossed to 
a pure-breeding fly with normal wings, genotype CC. Their 
F, progeny all have normal wings. F; flies are crossed, and 
the F, progeny have a 3:1 ratio of normal wing to cut wing. 
One male F, fly with normal wings is selected at random 
and mated to an F, female with normal wings. Using all 
possible genotypes of the F flies selected for this cross, list 
all possible crosses between the two flies involved in this 
mating, and determine the probability of each cross. 


Situs inversus is a congenital condition in which the major 
visceral organs are reversed from their normal positions. 
Investigations into the genetics of this abnormality re- 
vealed that individuals with at least one dominant allele 
(SI) of an autosomal gene are normal but, surprisingly, of 
individuals that are homozygous for a recessive allele (si), 4 
are situs inversus and å are normal. 
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a. What genotypes and phenotypes are expected in prog- 
eny from a cross of two si si individuals? 

b. What genotypes and phenotypes are expected in prog- 
eny from a cross of two SI si individuals? 


Domestic dogs evolved from ancestral grey wolves. Wolves 
have coats of short, straight hair and lack “furnishings,” 

a growth pattern marked by eyebrows and a mustache 
found in domestic dogs. In domestic dogs, coat variation 
is controlled by allelic variation in three genes. Recessive 
mutant alleles in the FGFS gene result in long hair, while 
dogs carrying the dominant ancestral allele have short 
hair. Likewise, recessive mutant alleles in the KRT71 gene 
result in curly hair, whereas dogs with an ancestral domi- 
nant allele have straight hair. Dominant mutant alleles in 
the RSPO2 gene cause the presence of furnishings, while 
dogs homozygous for the ancestral recessive allele have 
no furnishings. 

A pure breeding curly- and long-haired poodle with 
furnishings was crossed to a pure-breeding short- and 
straight-haired border collie lacking furnishings. 

a. What are the genotypes and phenotypes of the puppies? 

b. If dogs of the F; generation are interbred, what 
proportions of genotypes and phenotypes are expected 
in the F3? 


Alleles at the JGF-1 locus in dogs, encoding insulin-like 
growth factor, largely determine whether a domestic dog 
will be large or small. Dogs with an ancestral dominant 
allele are large, whereas dogs homozygous for the mutant 
recessive allele are small. Chondrodysplasia, a short-legged 
phenotype (as in dachshunds and basset hounds), is caused 
by a dominant gain-of-function allele of the FGF4 gene. 
The MSTN gene encodes myostatin, a negative regulator of 
muscle development. Dogs with a dominant ancestral allele 
of the MTSN gene have normal muscle development, while 
dogs homozygous for recessive mutants in the MTSN gene 
are “double muscled” and have trouble running quickly. 
However, dogs heterozygous for the mutant allele run 
faster than either of the homozygotes. 

You breed a pure-breeding small basset hound of 
normal musculature with a pure-breeding “bully” whippet, 
a double-muscled large dog with normal legs. 

a. What are the genotypes and phenotypes of the F; 
puppies? 

b. Ifthe F; of this cross is interbred, what proportion 
of the F, are expected to be fast runners and what 
proportion normal-speed runners? 


The Basalt Seed Lending Library run by the Central 

Rocky Mountain Permaculture Institute and the Basalt 

(Colorado) Regional Library (see Experimental Insight 2.3) 

loans heirloom vegetable seeds to patrons. 

a. The many different types of seed produce plants and 
vegetables that consistently have specific traits. Give a 
genetic explanation for the consistent production of the 
same traits from plants grown from heirloom seeds. 

b. A goal of the seed-lending program over time is to 
generate seeds and plants that thrive and yield better 
harvests. From an evolutionary perspective, explain 
how saving and replanting seeds from the most 
productive plants each year contributes to this goal. 
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Cell division is a complex but carefully controlled process. Chromosomes, 
stained in blue, are ready to separate in anaphase. Different kinds of micro- 
tubules, shown in green, help drive the chromosome segregation process. 


pS of decades or so ago, at the moment of 

conception that culminated in your birth, two gametes 
united to form the single fertilized cell—the zygote—from 
which you developed. Your sex was determined in that 
instant by the sex chromosome carried by the fertilizing 
sperm—an X chromosome if you are female or a Y chromo- 
some if you are male. Shortly after fertilization, cell divi- 
sion began that over the next few hours increased the tiny 
zygote to two cells, then four cells, then eight cells, and so 
on, as it moved down the fallopian tube toward the uterus. 
Over several days, these cell divisions produced hundreds 

of exact genetic replicas of the original fertilized egg. About 

1 week after fertilization, these cells, now called a blastocyst, 


were implanted into the uterine wall, and within 2 
weeks of conception, genetically controlled pro- 
cesses of cell differentiation and cell specialization 
began to form the first embryonic organs and struc- 
tures. These processes eventually determined the 
structure and function of each cell in your body (see 
Chapter 20). 

Since then, your body has produced thousands 
of generations of cells. The mechanism of cell divi- 
sion that produced most of them, mitosis, is an 
ongoing process that with each division creates 
two identical daughter cells that are exact genetic 
replicas of the parental cell they are derived from. 
Mitosis produces somatic cells, the structural cells 
of the body. Therefore, mitosis is responsible for 
the growth and maintenance of your body, its 
organs, and its various structures; it repairs the 
damage and injury your body sustains, and it 
produces new cells to replace those that undergo 
programmed cell death (apoptosis). While you have 
been reading this passage, approximately 200 cells 
in your body have undergone mitotic division. 

There are trillions of somatic cells in your body, and 
nearly all of them contain a nucleus that encloses two 
sets of chromosomes. The somatic cells of most other 
eukaryotes also contain multiple sets of chromosomes. 
The most common multiple of chromosome sets in 
animal nuclei is two, and the number of chromosomes 
present as homologous pairs in such nuclei is called 
the diploid number. Your somatic cell nuclei contain 
46 chromosomes each, in 23 homologous pairs, so 
your diploid number is 46. The diploid number varies 
among species (each species having its characteristic 
number of pairs) and so is identified nonspecifically 
as 2n. The value n represents the haploid number of 
chromosomes, a value that is one-half the diploid num- 
ber and is the number of chromosomes contained in 
the nuclei of gametes, the nonsomatic cells. 

Gametes, produced from germ-line cells, are the 
germinal, or reproductive, cells: sperm and egg in 
animals or pollen and egg in plants. Germ-line cells 
divide by meiosis, which is different in several ways 
from mitosis. 

In this chapter, we examine both mitosis and mei- 
osis, and we look closely at the connection between 
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meiotic cell division and Mendel’s laws of heredity. 
We also explore patterns of sex determination in 
eukaryotes and look at processes that equalize the 
expression of genes carried on sex chromosomes, 
the chromosomes that determine sex. In addition, 
we study the special patterns of inheritance of genes 
on the X chromosome, and we describe how the 
discovery of genes on the X chromosome supported 
the chromosome theory of heredity, the theory that 
chromosomes are the cell structures that carry genes. 


3.1 Mitosis Divides Somatic Cells 


Mitosis, the cell-division process that produces two geneti- 
cally identical daughter cells from a single original parental 
cell, is among the most fundamental and important pro- 
cesses occurring in eukaryotes. It is a genetically controlled 
process that follows a precise script to enable organisms to 
grow and develop normally and to maintain the structures 
and functions of their organs, tissues, and other bodily com- 
ponents. Life depends on the orderly progression and proper 
regulation of mitosis. If too little cell division takes place or 
cell division occurs too slowly, an organism may fail to de- 
velop at all, or it may have morphologic abnormalities. On 
the other hand, too much cell division can lead to growth 
of structures beyond their normal boundaries, likewise pro- 
ducing morphologic abnormality and possible death. 


Stages of the Cell Cycle 


Cell division is regulated by genetic control of the cell 
cycle, the life cycle cells must pass through in order to 
replicate their DNA and divide. Since well-regulated cell 
division is such an integral part of life, it will not surprise 
you to learn that the cell cycles of all eukaryotes are similar 
and that much of the molecular machinery that controls 
the cell cycle is evolutionarily conserved in plants and ani- 
mals. The striking similarity of cell cycle control genes and 
processes in plants and animals, and the sharing of many 
of these genes with Bacteria and Archaea, is powerful evi- 
dence that all life evolved from a single common ancestor. 
The eukaryotic cell cycle is divided into two principal 
phases—M phase, a short segment of the cell cycle dur- 
ing which cells divide, and interphase, the longer period 
between one M phase and the next (Figure 3.1a). Interphase 
consists of three successive stages, G4, S, and Gy. During 
these stages, respectively, the cell expresses its genetic in- 
formation, replicates its chromosomes, and prepares for 
entry into M phase. M phase is divided into substages that 
correspond to the progress of the cell during its division. 
When viewed under a light microscope, somatic cells 
in interphase may appear rather placid, but their outward 
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Figure 3.1 The cell cycle. (a) The cell cycle is divided into interphase and M phase, which are each 
further subdivided. The cycles are not drawn to scale. (b) An overview of cell cycle activities. 


appearance gives little indication of the complex activity 
taking place inside. Gene transcription occurs continuously 
throughout the cell cycle, but during the G, (or Gap 1) 
phase of interphase, cells rates of transcription and transla- 
tion are particularly high. (Figure 3.1b). Cells of different 
types vary in how many genes they express, in how they 
function in the body, and in how they interact with other 
cells. Consequently, the duration of G; varies. Some types 
of cells are rapidly dividing and spend only a short time, 
perhaps as little as a few hours, in G4. Other cells linger in 
G; for periods of days, weeks, or more. 

As they approach the end of Gy, cells follow one of two 
alternative paths. Most cells enter the S phase, or synthe- 
sis phase, during which DNA replication (DNA synthesis) 
takes place. On the other hand, a small subset of specialized 
cells transition from G; into a nondividing state called Go 
(“G zero”), a kind of semiperpetual G,-like state in which 
cells express their genetic information and carry out nor- 
mal functions but do not progress through the cell cycle 
(see Figure 3.1b). Several kinds of cells in your body, in- 
cluding certain cells in your eyes and bones, reach a mature 
state of differentiation, enter Go, and rarely if ever divide 
again. Most Go cells maintain their specialized functions 
until they enter programmed cell death (apoptosis) and die. 
Cells only rarely leave Go and resume the cell cycle. 

DNA replication takes place during S phase and results 
in a doubling of the amount of DNA in each nucleus and 
the creation of two sister chromatids for each chromosome. 
Entry into the S phase almost always commits the cell to 
proceeding through the remainder of the cycle and then 
dividing. The completion of S phase brings about the transi- 
tion to the Gg, or Gap 2, phase of the cell cycle, during which 
cells prepare for division. Interphase ends when cells enter M 
phase, from which two identical daughter cells emerge. 


The successive generations of cells produced through 
mitosis as one cell cycle follows the next are known as cell 
lines or cell lineages. Each cell line or cell lineage contains 
identical cells (i.e., clones) that are all descended from 
a single founder cell. Mitosis ensures that the genetic 
information in cells is faithfully passed to successive gen- 
erations of cell lineages. Occasional mutations occur in 
individual cells, however, and these are also perpetuated 
during the proliferation of the cell line. 


Substages of M Phase 


M phase follows interphase and is divided into five 
substages—prophase, prometaphase, metaphase, ana- 
phase, and telophase—whose principal features are 
described in Figure 3.2. These five substages accomplish 
two important functions of cell division—karyokinesis 
and cytokinesis. Karyokinesis is the equal partitioning of 
the chromosomal material in the nucleus of the parental 
cell between the nuclei of the two daughter cells. This 
process requires first that each of the chromosomes in the 
nucleus be fully and accurately duplicated and then that 
the duplicate copies of each chromosome be separated 
so that one copy goes to the nucleus of one daughter cell 
and the second copy goes to the other daughter nucleus. 
Karyokinesis is followed by cytokinesis, the partitioning 
of the cytoplasmic contents of the parental cell into the 
daughter cells. Cytokinesis does not demand the same 
degree of equivalency required in karyokinesis. The cyto- 
plasm of the parental cells contains an abundance of the 
proteins and organelles that the daughter cells require in 
order to function, so the division of this material need not 
be equal. Cells entering mitosis are diploid (27), and they 
are diploid at the end of mitosis as well. 


The chromosomes are so diffuse during interphase 
that they cannot be clearly seen by light microscopy. 
Chromosome condensation begins in early prophase and 
progressively condenses chromosomes, which are visible 
by mid-prophase. Chromosome condensation continues 
until chromosomes reach their maximum level of conden- 
sation in metaphase. Nuclear envelope breakdown also 
occurs in prophase, and chromosome centromeres be- 
come visible as do the sister chromatids of each chromo- 
some. The centromere is a specialized DNA sequence 
on each chromosome, and its location is identified as a 
constriction where the sister chromatids—the two cop- 
ies that were duplicated in S phase—are joined together. 
Centromeric DNA sequence binds a specialized protein 
complex called the kinetochore that facilitates chromo- 
some division later in M phase. 

The definition and usage of the terms chromosome, 
chromatid, and sister chromatid sometimes cause confu- 
sion, and this is a good time to present the definitions we 
will use in the remaining discussion of cell division. The 
term chromosome is used throughout the cell cycle to iden- 
tify each DNA-containing structure that has a centromere. 
At the end of Gj, a chromosome consists of a single DNA 
duplex with associated proteins. After the completion of S 
phase, a chromosome consists of two replicated DNA du- 
plexes with associated proteins. The two DNA molecules 
making up this chromosome are identical. Individually, 
these DNA molecules are identified as chromatids, and 
together they are identified as the sister chromatids. 


Chromosome Distribution 


In addition to visible changes to chromosomes, cellular 
changes are also apparent in prophase. In animal cells, 
although not in most plants, fungi, or algae, two organelles 
called centrosomes appear that migrate during M phase 
to form the two opposite poles of the dividing cell. Each 
centrosome contains a pair of subunits called centrioles 
(Figure 3.3). Centrosomes are the source of spindle fiber 
microtubules that emanate from each centrosome. Spindle 
fiber microtubules are polymers of tubulin protein sub- 
units that elongate by the addition of tubulin subunits and 
shorten by the removal of tubulin subunits. Microtubules 
are polar; they have a “minus” (—) end anchored at the 
centrosome and a “plus” (+) end that grows away from the 
centrosome. Specialized proteins called motor proteins are 
associated with microtubules. Motor proteins move chro- 
mosomes and other cell structures along microtubules. 

Three kinds of spindle fibers emanate from centro- 
somes in a 360° pattern identified as the aster: 


1. Kinetochore microtubules embed in the protein 
complex called the kinetochore (described shortly) 
that assembles at the centromere of each chromatid. 
Kinetochore microtubules are responsible for chro- 
mosome movement during cell division. 
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2. Polar microtubules, also called nonkinetochore 
microtubules, extend toward the opposite pole of their 
centrosome and overlap with polar microtubules from 
that pole. These microtubules contribute to the elonga- 
tion of the cell and to cell stability during division. 


3. Astral microtubules grow toward the membrane of the 
cell, where they attach and contribute to cell stability. 


The kinetochore, a protein complex with an outer 
plate and an inner plate, assembles on the centromere and 
is bound by the plus ends of kinetochore microtubules. By 
the end of prometaphase, kinetochore microtubules from 
each centrosome are attached to the kinetochore of each 
chromatid of the sister chromatid pair (see Figure 3.3). 

Metaphase chromosomes condense more than 10,000- 
fold in comparison to the beginning of prophase. This 
makes them easily visible under the microscope and allows 
them to be easily moved within the cell. Because they are 
tethered to kinetochore microtubules from opposite cen- 
trosomes, the sister chromatids experience opposing forces 
that are critical to the positioning of chromosomes along an 
imaginary midline at the equator of the cell. This imaginary 
line is called the metaphase plate. 

The tension created by the pull of kinetochore mi- 
crotubules is balanced by a companion process known as 
sister chromatid cohesion. Sister chromatid cohesion is 
produced by the protein cohesin that localizes between 
the sister chromatids and holds them together to resist the 
pull of kinetochore microtubules (Figure 3.4). Cohesin is a 
4-subunit protein; its central component is a polypeptide 
produced by the gene Scc 1, for “sister chromatid cohesion.” 
Cohesin coats sister chromatids along their entire length 
but is most concentrated near centromeres, where the pull 
of microtubules is greatest. As microtubules move chromo- 
somes toward the midline of the cell, cohesin helps keep the 
sister chromatids together, to ensure proper chromosome 
positioning and to prevent their premature separation. 

Anaphase is the part of M phase during which sister 
chromatids separate and begin moving to opposite poles 
in the cell. Anaphase includes two distinct events tied 
to microtubule action: anaphase A, characterized by the 
separation of sister chromatids, and anaphase B, charac- 
terized by the elongation of the cell into an oblong shape. 

Anaphase A begins abruptly with two simultaneous 
events. First, the enzyme separase initiates cleavage of 
polypeptides in cohesin, thus breaking down the con- 
nection between sister chromatids. Second, kinetochore 
microtubules begin to depolymerize at their (+) ends to 
initiate chromosome movement toward the centrioles. 
The separation of sister chromatids in anaphase A is 
called chromosome disjunction. As anaphase progresses, 
sister chromatids complete their disjunction and eventu- 
ally congregate around the centrosomes at the cell poles. 

The next part of anaphase, anaphase B, is character- 
ized by the polymerization of polar microtubules that 
extends their length and causes the cell to take on an 
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The G, interphase cell pictured here has 
passed through G, and S phases, during 
which the chromosomes duplicate. 
Although duplicated, the chromosomes 
are diffuse and not visible within the 
nucleus. An intact nuclear envelope 
encloses the chromosomes and one or 
more nucleoli. Two centrosomes, each 
containing a centriole pair, are located in 
the cytoplasm. Microtubules begin to 
extend from the centrosomes in radial 
patterns that form asters. 
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Chromosome condensation begins in 
and progresses throughout prophase, 
making the coalescing chromosomes 
increasingly visible under the light 
microscope. In the cytoplasm, the paired 
centrosomes begin to migrate toward 
opposite poles of the cell, extending 
their microtubules to form the early 
mitotic spindle. By the end of prophase, 
the two sister chromatids that make up 
each chromosome can be seen. 
Centromeres can also be seen on late- 
prophase chromosomes. The nucleolus 
disappears. 
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Nuclear envelope breakdown occurs 
during prometaphase. Having reached 
opposite poles of the cell, the centro- 
somes extend microtubules that attach 
to kinetochores of chromosome 
centromeres. Microtubules extending 
from opposite poles exert pulling forces 
in both directions. Chromosomes moves 
toward the middle of the cell. Cohesin 
binds sister chromatids to resist 
premature separation due to pulling 
forces. Nonkinetochore and astral 
microtubules stabilize the cell. 


Figure 3.2 


oblong shape. The oblong shape facilitates cytokinesis at 
the end of telophase, which leads to the formation of two 
daughter cells. 


Completion of Cell Division 


In telophase, nuclear membranes begin to reassemble 
around the chromosomes gathered at each pole, eventu- 
ally enclosing the chromosomes in nuclear envelopes. 
Chromosome decondensation begins and ultimately 
returns chromosomes to their diffuse interphase state. At 


Interphase and the five stages of mitosis. The chromosomes are shown in blue, and 
the centrosomes, asters, and spindle fibers are shown in green. 


the same time, microtubules disassemble. As telophase 
comes to an end, two identical nuclei are observed within 
a single elongated cell that is about to be divided into two 
daughter cells by the process of cytokinesis. 

In animal cells, a contractile ring composed of actin 
microfilaments creates a cleavage furrow around the cir- 
cumference of the cell; the contractile ring pinches the cell 
in two (Figure 3.5). In plant cells, cytokinesis entails the 
construction of new cell walls near the cellular midline. In 
both plant and animal cells, cytokinesis divides the cyto- 
plasmic fluid and organelles. 
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Complete chromosome condensation is 
reached in metaphase, and the fully 
condensed chromosomes align so that 
the sister chromatids of each chromo- 
some lie on either side of the metaphase 
plate. The sister chromatids of each 
chromosome are attached to kineto- 
chore microtubules emanating from 
centrosomes at opposite poles of the cell. 
Kinetochore, nonkinetochore, and astral 
microtubules are fully extended from the 


Sister chromatid separation (disjunction) 
occurs through the breakdown of sister 
chromatid cohesion and the depolymer- 
ization of kinetochore microtubules. The 
daughter chromosomes, tethered to 
depolymerizing kinetochore microtu- 
bules, move toward opposite poles and 
congregate near centrosomes. Polymer- 
ization of nonkinetochore microtubules 
accompanies the movement of daughter 
chromosomes, giving the cell an oblong 


Nonkinetochore microtubule polymer- 
ization continues to elongate the cell in 
telophase, pushing the poles apart. The 
nuclear envelope begins to reassemble 
and will shortly surround the chromo- 
somes. Chromosome decondensation 
accompanies nuclear envelope 
reassembly. Cytokinesis divides the 
cytoplasm to create two new cells by 
formation of new cell walls, in plant cells, 
or a contractile ring and cleavage furrow, 


centrosomes, and a complete mitotic 
spindle is in place. 


shape at the end of anaphase. 


in animal cells. The nucleolus re-forms. 


Mitosis separates the members of each pair of sister 
chromatids into identical nuclei, thus forming two ge- 
netically identical daughter cells. Figure 3.6 shows four 
chromosomes in a cell of an organism that is dihybrid 
(AaBb) for genes on the chromosomes shown. The fig- 
ure follows major events of the cell cycle, showing the 
generation of sister chromatids in S phase, chromosome 
alignment on the metaphase plate in metaphase, and 
the production of two identical (AaBb) daughter cells 
at the end of telophase. Notice that the diploid (27) 
number of chromosomes is maintained throughout the 
cell cycle. 


Cell Cycle Checkpoints 


Cell biologists find that no matter what the duration of 
the cell cycle, most cells follow the same basic program; 
this suggests that common, genetically controlled signals 
drive the cell cycle. Knowledge of the genes and proteins 
controlling the cell cycle comes not from normal cells but 
from the study of cell lineages possessing mutations that 
affect their progression through the cell cycle. These stud- 
ies have produced important insights into genetic control 
of the cell cycle, and in recent decades, biologists have 
discovered the identities and functions of many genes 
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responsible for cell cycle control. What has been learned 
about genetic control of the cell cycle can be applied to 
the study of normal cell division as well as to the study 
of cell division abnormalities such as those displayed 
in cancer. 

As cells move through the cell cycle, their readiness to 
progress from one stage to the next is regularly assessed. 
The numerous cell cycle checkpoints, four of which are il- 
lustrated in Figure 3.7a, are times during the cell cycle when 
cells are monitored by protein interactions that assess the 
status of the cell and its readiness to progress to the next 
stage. One mechanism for this monitoring takes place by 
means of protein complexes that join a protein kinase with 
a second protein known as a cyclin protein. Protein kinases 
catalyze protein phosphorylation—the addition of a phos- 
phate group transferred from a nucleotide triphosphate 
such as ATP or GTP to a target protein. Phosphorylation 
changes the conformation of target proteins and can either 
activate or inactivate the target protein. Protein kinases are 
usually present continuously in cells at relatively steady 
concentrations. Cyclin proteins, however, are so named 
because their concentrations are cyclic and linked to cell cy- 
cle stage. Cyclin protein production is stimulated by growth 
factor proteins that are produced by other cells. The pro- 
tein kinase components of these complexes are activated 
only when they associate with a cyclin; thus, the protein 
kinases are called cyclin-dependent kinases, abbreviated 


Y . Zz 
\— Kinetochore of 
à microtubule 7 


“i 


j s Depolymerization 


containing 
motor 
proteins 


Kinetochore 
(one on each chromatid) 


Sister 
chromatids 


N 


© `k 
e \ 
i eel microtubule ` 
= @ (emanating from 
Ne centriole) 
(a) Prophase iaer 
Microtubule chromatids 


Cohesin 
protein 


Kinetochore 


(b) Metaphase AN 
\ Kinetochore 
movement 


(c) Anaphase 


© 
Separase 


> 


Figure 3.4 Sister chromatid cohesion during mitosis. 
Cohesin protein generates cohesion between sister chromatids 
(a) and (b). At anaphase (c), separase protein digests cohesin 
and allows sister chromatids to separate. 


(a) 


Contractile 
ring and furrow 


(b) 


Cell plate 


Figure 3.5 Cytokinesis in animal cells (a) and plant cells (b). 


Cdk. In their activated state, cyclin—Cdk complexes phos- 
phorylate numerous target proteins and regulate cell cycle 
progression at various checkpoints. 

Changes in the production of cyclin proteins changes 
through the cell cycle (Figure 3.7b). For example, Cdk4 joins 
with cyclin D2, forming cyclin D2—Cdké4 that is active at the 
G,-S checkpoint. Separately, Cdk4 pairs with cyclin D1 to 
form cyclin D1—Cdké4 that is active later in the cell cycle. 

One prominent target of cyclin D1—Cdk4 is the 
retinoblastoma protein (pRB) that is produced by the 
retinoblastoma 1 (RB1) gene. In normal cells, pRB binds 
a transcription activator protein known as E2F, and to- 
gether the pRB-E2F complex blocks cell cycle progres- 
sion from G4 to S phase (Figure 3.8). The cyclin D1-Cdk4 
complex phosphorylates pRB, causing it to release E2F. 
Free E2F binds to DNA and activates the transcription of 
several genes that produce proteins essential in S phase. In 
other words, active cyclin D1—Cdk4 allows the cell to pass 
through the G, checkpoint and enter S phase by releasing 
E2F that is otherwise bound to unphosphorylated RB. 

The presence of unphosphorylated pRB in a cell acts as 
a brake on the cell cycle, halting it at the G, checkpoint and 
preventing progression to S phase. The RBI gene that pro- 
duces pRB and known as a tumor suppressor gene because 
the protein product of this and other genes of the same type 
block progression of the cell cycle. In contrast, the produc- 
tion of cyclin D1, from expression of the cyclin D1 gene, leads 
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to formation of the cyclin D1—Cdk4 complex that stimulates 
cell cycle progression from G4 to S phase. Cyclin D1 is one of 
many examples of proteins produced by genes known as 
proto-oncogenes. When expressed, proto-oncogenes stimu- 
late cell cycle progression. Mutated proto-oncogenes, desig- 
nated oncogenes, are associated with cancer development. 
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Chromosomes align 
randomly along the 


metaphase plate 
with the aid of the 
mitotic spindle. 


Telophase 


Two daughter cells are produced by mitosis. Each is 
AaBb following sister chromatid separation to form 
daughter chromosomes. 


Figure 3.6 An overview of mitosis. 
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(a) 


G, checkpoint: 
Pass if cell size is 
adequate and 
chromosome 
replication is 
successfully 
completed. 


Metaphase checkpoint: 
Pass if all chromosomes are 
attached to mitotic spindle. 


S-phase checkpoint: 
Pass if DNA replication 
is complete and has 
been screened to 
remove base-pair 
mismatch or error. 


G, checkpoint: 

Pass if cell size is adequate, 
nutrient availability is 
sufficient, and growth 
factors (signals from other 
cells) are present. 
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b) 


Relative amounts of cyclins 


Phases of the cell cycle 


Figure 3.7 Cell cycle checkpoints and cyclin proteins. 

(a) Genetic mechanisms monitor four major cell cycle check- 
points. (b) The production of cyclin proteins varies coincident 
with stages of the cell cycle. 


Cell Cycle Mutations and Cancer 


Controlling cell division frequency is an essential activity 
of normal growth and development. In contrast, mutations 
altering the control or progression of cells through the cell 
cycle are commonly found in cancer. Cancer is often charac- 
terized by out-of-control cell proliferation that leads to tumor 
formation and the overgrowth of cancerous cells that invade 
and displace normal cells. Loss of cell cycle control is a funda- 
mental mechanism leading to cancer development. 

As examples of the loss of cell cycle control in cancer, 
let’s consider two kinds of mutations that alter the normal 
interaction of cyclin D1—Cdk4 and pRB. The first category 


of mutations are those that either increase the number of 
copies of cyclin D1 by duplicating the cyclin D1 gene, or 
significantly increase the level of transcription of cyclin D1. 
These mutations lead to higher-than-normal levels of cy- 
clin D1. Since Cdk4 is continuously available in cells, over- 
production of cyclin D1 causes uncontrolled entry into S 
phase by continuous phosphorylation of pRB and release 
of E2F to stimulate S-phase-related gene transcription. 
Mutations of this kind occur in parathyroid tumors, B-cell 
lymphomas, and certain other cancers in humans. 

Mutation of the RB1 gene and the production of ab- 
normal pRB drives a different kind of abnormal growth. 
Mutation of RB1 resulting in pRB protein that binds weakly 
or not at all to E2F contributes to the development of sev- 
eral cancers, including those of the lung, bladder, breast, 
and bone, by allowing uncontrolled entry into S phase. 

Mutation of RB1 is also the cause of a cancer of 
light-sensitive cells of the retina in the eye. The cancer, 
called retinoblastoma, occurs in early childhood and 
forms tumors of rapidly proliferating cells in the retina. 
Retinoblastoma is rare, occurring in 1 in 15,000 children. 
It occurs in two forms: a hereditary type, meaning that 
a child inherits a mutation of RBI from a parent, and a 
sporadic type in which RB1 mutations are not inherited. 
Retinoblastoma occurs only when both copies of RB1 are 
mutated; thus, the development of retinoblastoma is an 
example of a recessive cancer phenotype. 

In hereditary retinoblastoma, one RB1 mutation is 
inherited; this means that all cells of the body, including 
retinal cells, carry one mutant gene. The acquisition of 
the second mutation of the wild-type copy of RB1 occurs 
at a somatic level: The wild-type RB1 gene could undergo 
mutation in any of the millions of cells in either retina. 
This second mutation produces the recessive genotype 
that leads to retinoblastoma development. 

Sporadic retinoblastoma also requires that both RBI 
genes undergo mutation; however, both copies of the gene 
are wild type at fertilization, meaning that mutation must 
alter the two copies of the gene in the same retinal cell. 


3.2 Meiosis Produces Gametes for 
Sexual Reproduction 


Reproduction is a basic requirement of living organisms. 
In more than three centuries of observation, biologists 
have identified a dizzying array of reproductive meth- 
ods, mechanisms, and behaviors in animals, plants, and 
microbes. Even so, reproduction can be divided into two 
broad categories: (1) asexual reproduction, in which or- 
ganisms reproduce without mating, giving rise to progeny 
that are genetically identical to their parent; and (2) sexual 
reproduction, in which cells called reproductive cells or 
gametes are produced by cell division and unite during 
fertilization. 
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Figure 3.8 Cyclin-Cdk complexes regulate the cell cycle. Cyclin D1-Cdk4 specifically interacts 


with pRB-E2F to regulate entry into S phase. 


Bacteria and archaea reproduce exclusively by asex- 
ual reproduction. These organisms are haploid; they usu- 
ally have just a single chromosome. Cell division follows 
shortly after the completion of chromosome replication; 
each cell produces two genetically identical daughter cells. 

Single-celled eukaryotes, such as yeast, can repro- 
duce either sexually or asexually. Asexual reproduction 
in yeast is similar to cell division in bacteria. A haploid 
yeast cell undergoes DNA replication and distributes a 
copy of each chromosome to identical daughter cells. 
While yeast spend most of their life cycle in a hap- 
loid state and actively reproduce as haploids, it is also 
common for two haploid yeast cells to fuse and form 
a diploid cell that produces gametes (called spores) by 
meiosis. The spores produced by each completed mei- 
otic division are usually contained in a structure called 
an ascus. The individual haploid spores of an ascus can 
be removed and grown on plates, as we will see illus- 
trated later in the chapter. 

In contrast to single-celled eukaryotes, multicellular 
eukaryotes reproduce predominantly by sexual means. 
In most animal species and dioecious plants, males and 
females carry distinct reproductive tissues and structures. 
Mating requires the production of haploid gametes from 
both male structures and female structures. The union of 
haploid gametes produces diploid progeny. In monoecious 
plant species, including the Pisum sativum that Mendel 


worked with, male and female reproductive tissues are 
present in each plant, and self-fertilization is the com- 
mon mode of reproduction, although fertilization involv- 
ing pollen from one plant fertilizing the flower of another 
also occurs. 

In sexually reproducing animals, specialized germ- 
line cells undertake meiosis to produce haploid gametes, 
or reproductive cells. Female gametes are produced by 
the ovary in female animals or by the ovule in plants. 
Male germ-line cells are located in testes in animals, 
where they produce sperm. In the anthers of flowering 
plants, pollen containing two sperm cells is produced. 
These descriptions are broadly true for most plants and 
animals, but there are many exceptions, including the 
observation of asexual reproduction in several species of 
fish, rotifers (small aquatic organisms), and salamanders. 
In addition, male ants, bees, and wasps have haploid 
somatic cells, and their processes of gamete production 
are distinctive. 


Meiosis versus Mitosis 


Meiosis shares numerous features that are similar or identi- 
cal to events in mitosis. For example, interphase of all cells 
is the same. Interphase of the germ-line cell cycle contains 
stages Gj, S, and G, that are indistinguishable from those 
in somatic cells. Similarly, the actions and functions of 
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Table 3.1 


Comparison of Mitosis and Meiosis 


Characteristic Mitosis 


Meiosis 


Purpose Produce genetically identical cells for Produce gametes for sexual reproduction that are 
growth and maintenance genetically different 
Location Somatic cells Germ-line cells 
Mechanics One round of division following one round Two rounds of division (meiosis | and meiosis II) following a 
of DNA replication single round of DNA replication 
The mechanical basis of Mendel’s laws of heredity 
Homologous Do not pair Synapsis during prophase | 
chromosomes Rarely undergo recombination Crossing over during prophase | 
Separate at anaphase | 
Sister Attach to spindle fibers from opposite Attach to spindle fibers from the same pole in metaphase | 
chromatids poles in metaphase Migrate to the same pole in anaphase | 
Separate and migrate to opposite poles at Attach to spindle fibers from opposite poles in metaphase II 
anaphase Separate and migrate to opposite poles in anaphase II 
Product Two genetically identical diploid daughter Four genetically different haploid cells that mature to form 


cells that continue to divide by mitosis 


subcellular structures such as centrosomes and the micro- 
tubules they produce are the same in all cells. Nor is mitosis 
exclusive to somatic cells. Germ-line cells of plants and 
animals are created and maintained by mitotic division. 
These cells undertake meiosis solely for the purpose of 
producing gametes. Meiosis is distinguished from mitosis 
by the activities taking place during meiotic M phase and by 
the production of four haploid gametes. Table 3.1 compares 


Figure 3.9 An overview of meiosis. 
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o 


gametes and unite to form diploid zygotes 


and contrasts numerous differences in the processes and 
outcomes of mitosis and meiosis that are described in the 
following sections. 

Meiotic interphase is followed by two successive 
cell-division stages known as meiosis I and meiosis II. 
There is no DNA replication between these meiotic cell 
divisions, so the result of meiosis is the production of 
four haploid daughter cells (Figure 3.9). In meiosis 1, 
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MEIOSIS I: Separates homologous chromosomes 


Centrosomes 


Centromere 


Zygotene 


Bivalent 


Chromosomes Nuclear Microtubules 


envelope 


Prophase I: Leptotene 


Cells entering the first substage 
of meiotic prophase | have 
passed through interphase and 
have had chromosomes 
duplicated. Progressive 
chromosome condensation 
begins in leptotene, but the 
chromosomes remain too 
diffuse to be seen at this stage. 
Centrosomes begin to migrate 
toward opposite poles of the 
cell, and asters of microtubule 
spindle fibers are produced 
from each centrosome. 


Prophase I: Pachytene 


Chromosome condensation is 
partially complete, and 
synapsed homologous 
chromosomes are seen as 
bivalent structures. Crossing 
over occurs between nonsister 
chromatids of homologous 
chromosomes. Kinetochore 
microtubules attach to 
kinetochores, and nonkineto- 
chore and astral microtubules 
emanate from centrosomes 
that are nearly at opposite 
poles in the cell. Nuclear 
envelope breakdown continues. 


Figure 3.10 The stages of meiosis (continued on p. 76). 


homologous chromosomes separate from one another, 3. Segregation (separation) of the homologous chro- 
reducing the diploid number of chromosomes (27) to mosomes that reduces chromosomes to the haploid 
the haploid number (n). In meiosis II, sister chromatids number 


separate to produce four haploid gametes, each with one 
chromosome of every diploid pair. 

Following the completion of meiosis, each gamete 
contains a single nucleus holding a haploid chromosome 
set. The gametes of the two sexes are often dramatically 
different in size and morphology, however. Female gam- 
etes are generally much larger than male gametes and 
have a haploid nucleus, a large amount of cytoplasm, and a 
full array of organelles. In contrast, male gametes contain 
a haploid nucleus but very little cytoplasm and virtually no 
organelles. As the fertilized ovum begins mitotic division, 
the organelles and cytoplasmic structures provided by the 
maternal gamete support its early zygotic growth. 


Meiosis I is divided into four stages: prophase I, meta- 
phase I, anaphase I, and telophase I. Homologous 
chromosome pairing, called chromosome synapsis, 
and recombination take place in prophase I; thus, this 
stage is subdivided into five substages—leptotene stage, 
zygotene stage, pachytene stage, diplotene stage, and 
diakinesis stage—to more accurately trace the interac- 
tions and recombination of homologous chromosomes. 
Figure 3.10 describes these stages and prophase I sub- 
stages in detail. 

Chromosome condensation begins during leptotene, 
when the meiotic spindle is formed by microtubules 
emanating from the centrosomes, which are moving 
to positions at opposite ends of the cell. The nuclear 


Meiosis | membrane begins to break down in zygotene, and the 
Three hallmark events take place during meiosis I: first hallmark feature of meiosis occurs—homologous 

chromosome synapsis, the alignment of homologous 
1. Homologous chromosome pairing chromosome pairs. Synapsis initiates formation of a pro- 


2. Crossing over between homologous chromosomes tein bridge called the synaptonemal complex, a tri-layer 


76 CHAPTER 3 Cell Division and Chromosome Heredity 


MEIOSIS I: Separates homologous chromosomes 


Metaphase | 
Centromere with Metaphase Polar 
kinetochore microtubule | plate microtubule 
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Prophase I: Diakinesis 


The meiotic spindle is well 
established, with bundles of 
kinetochore microtubules 
tethering homologous 
chromosomes of tetrads to 
opposite poles. The nuclear 
envelope is fully degraded. 
Tetrads are moved toward 
the middle of the cell. 


Metaphase I 


Tetrads are aligned along the 
metaphase plate, with each 
chromosome of a homologous 
pair tethered to kinetochore 
microtubules emanating from 
centrosomes at opposite poles 
of the cell. The kinetochores of 
sister chromatids are attached 
to the same centrosome, and 
sister chromatids are joined by 
cohesin to prevent their 
premature separation. 
Chiasmata linking nonsister 
chromatids are broken. 


Anaphase I 


Sister chromatids 
remain attached 


furrow 


Homologous 
chromosomes separate 


Nuclear 
envelope re-forms 


Anaphase I 


Depolymerization of 
kinetochore microtubules 
begins the disjunction of 
homologous chromosomes, 
which start moving toward 
opposite poles. Sister 
chromatids remain joined by 
cohesin. 


Telophase | and Cytokinesis 


Nuclear membranes re-form 
around the chromosomes 
clustered at each pole. Each 
newly formed nucleus contains 
a haploid set of chromosomes. 
Chromosomes may partially 
decondense. Cytokinesis 
divides the cytoplasmic 
material of the cell by 
separating the nuclei. The 
cytoplasmic division may be 
unequal. 


Figure 3.10 The stages of meiosis (continued). 


protein structure that maintains synapsis by tightly bind- 
ing nonsister chromatids of homologous chromosomes to 
one another (Figure 3.11). 

Nonsister chromatids are chromatids belonging to 
different members of a homologous pair of chromo- 
somes. The binding of nonsister chromatids by a synap- 
tonemal complex draws the homologs into close contact 
(synapsis). The synaptonemal complex contains two lat- 
eral elements, each consisting of proteins adhered to a 
chromatid from a different member of a pair of homolo- 
gous chromosomes as well as a central element that joins 
the lateral elements. The function of the synaptonemal 
complex is to properly align homologous chromosomes 
before their separation and then to facilitate recombina- 
tion between homologous chromosomes. 

Chromosome condensation continues in pachytene, 
and sister chromatids of each chromosome can be visu- 
ally distinguished by light microscopy. At this stage, the 
paired homologs are called a tetrad in recognition of 
the four chromatids that are microscopically visible in 
each homologous pair. Within the central element of the 


synaptonemal complex, new structures called recombi- 
nation nodules appear at intervals. 

Recombination nodules play a pivotal role in cross- 
ing over of genetic material between nonsister chro- 
matids of homologous chromosomes. The number of 
recombination nodules correlates closely with the aver- 
age number of crossover events along each homologous 
chromosome arm. Two important observations have 
been made about recombination nodules. First, their ap- 
pearance and location within the synaptonemal complex 
is coincident with the timing and location of crossing 
over; and second, recombination nodules seem to be 
present in organisms that undergo crossing over and 
absent in those that do not. Cell biologists have con- 
cluded that recombination nodules are aggregations 
of enzymes and proteins that are required to carry 
out genetic exchange between the nonsister chromatids 
of homologous chromosomes during pachytene. Later 
chapters discuss the genetic consequences of crossing 
over (Chapter 5) and the molecular processes of cross- 
ing over (Chapter 12). 
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MEIOSIS II: Separates sister chromatids 
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Sister chromatids are attached 
to kinetochore microtubules 
from opposite poles of the cell. 
The force of microtubule pull 
and the resistance created by 
cohesin leads to chromosome 
alignment along the metaphase 
plate. 


The nuclear envelope breaks 
down, and centrosomes 
duplicate and begin migrating 
to opposite poles of the cell. 
Microtubules emanate from the 
centrosomes, producing 
kinetochore, polar, and astral 
microtubules. Chromosome 
recondensation takes place. 


Anaphase II 
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Telophase II and Cytokinesis 


Chromosome migration is 
completed, and the chromo- 
somes begin to decondense. 
The nuclear envelope re-forms 
around chromosomes. 
Cytokinesis separates the newly 
formed nuclei and divides the 
cytoplasmic material, perhaps 
unevenly. 


Anaphase II 


Sister chromatid separation 
begins with the breakdown of 
cohesin by separase and the 
depolymerization of kineto- 
chore microtubules. As the 
sister chromatids move toward 
opposite poles, polymerization 
of polar microtubules elongates 
the cell. 


Figure 3.10 The stages of meiosis. 


The chromosomes continue to condense in diplotene 
as the synaptonemal complex begins to dissolve. The 
dissolution allows homologs to pull apart slightly, reveal- 
ing contact points between nonsister chromatids. These 
contact points are called chiasmata (singular: chiasma), 
and they are located along chromosomes where cross- 
ing over has occurred. Chiasmata mark the locations of 
DNA-strand exchange between nonsister chromatids of 
homologous chromosomes. 

Cohesin protein is present between sister chromatids 
to resist the pulling forces of kinetochore microtubules 
(Figure 3.12). In diakinesis, kinetochore microtubules ac- 
tively move synapsed chromosome pairs toward the meta- 
phase plate, where the homologs will align side by side. 

The chiasmata between homologous chromosomes 
are resolved in late prophase I so that the homologs can be 
aligned in metaphase I. This process of resolving the con- 
tacts between homologs is critical as to the completion of 
recombination between homologous chromosomes. 

Homologous chromosomes align on opposite sides 
of the metaphase plate in metaphase I. Kinetochore 


microtubules from one centrosome attach to the 
kinetochores of both sister chromatids of one chromo- 
some. Meanwhile, kinetochore microtubules from the 
other centrosome attach to the kinetochores of the sister 
chromatids of the homolog. Karyokinesis takes place in 
anaphase I as homologous chromosomes separate from 
one another and are dragged to opposite poles of the cell 
(see Figure 3.10). The sister chromatids of each chromo- 
some remain firmly joined by cohesin. Nuclear membrane 
reformation takes place in telophase I, when a haploid 
set of chromosomes are enclosed at each pole of the cell. 
Cytokinesis follows the completion of telophase I. 

Homologous chromosome disjunction (separation) 
in meiosis I reduces the number of chromosomes at each 
pole to the haploid number, so that one representative of 
each homologous pair of chromosomes is present. The 
first meiotic division is known as the reduction division, 
to signify the reduction of chromosome number from 
diploid to haploid. 

Recall that sex chromosomes differ from their au- 
tosomal counterparts in that the X chromosome and Y 
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Figure 3.11 The synaptonemal complex. A detailed line drawing of the synaptonemal complex 
and associated recombination nodules based on electron micrographs. 
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Figure 3.12 Homolog separation in meiosis I. (a) In diplotene and diakinesis of prophase I, cross- 
ing over between homologs is complete and contacts between homologs (chiasmata) are resolved. 

(b) Spindle fibers pull chromosomes to align them on the metaphase plate. Cohesin protein adheres 
sister chromatids against the pull of spindle fibers. (c) Homologous chromosomes separate at anaphase I. 


chromosome have very few genes in common. Even so, 
the X and Y chromosomes of males align as homologs 
in prophase I. This synapsis is accomplished with the aid 
of pseudoautosomal regions (PARs) on the two types 
of sex chromosomes. The term pseudoautosomal means 
“false autosomal”; a PAR is a segment of homology be- 
tween otherwise different chromosomes. PARs are like 
homologous sequences carried on authentic autosomes. 
The pattern of inheritance of a pseudoautosomal region 
would be indistinguishable from the pattern of autosomal 
inheritance, as a consequence of the homology. 

Human X and Y chromosomes each contain two pseu- 
doautosomal regions, PAR1 and PAR2, that are located at 
opposite ends of the chromosomes (Figure 3.13). PARI is 
located on the short arms of the X and Y chromosomes 
and contains about 2.7 Mb (millions of base pairs) of DNA. 
PAR2 is located on the long arms of the chromosomes and 
is shorter than PAR1—about 300,000 base pairs. Crossing 
over during chromosome synapsis occurs regularly be- 
tween PARI regions. Studies estimate the rate of recom- 
bination to be as much as twentyfold higher than for an 
equivalently sized region in autosomes. 


Meiosis Il 


The second meiotic division divides each haploid prod- 
uct of meiosis I by separating sister chromatids from one 
another in a process that is reminiscent of mitosis, except 
that the number of chromosomes in each cell is one-half 
the number observed in mitosis. The products of meiosis 
II mature to form the gametes that contain a haploid set 
of chromosomes. The four stages of meiosis II—prophase 
II, metaphase II, anaphase II, and telophase II—are shown 
and described in Figure 3.10. 

Meiosis II bears a general resemblance to mitosis 
in that kinetochore microtubules from opposite centro- 
somes attach to the kinetochores of sister chromatids. 
Also, as in mitosis, in meiosis II the chromosomes align 
randomly along the metaphase plate. Furthermore, sister 
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Figure 3.13 The pseudoautosomal regions of the X and Y 
chromosomes. 
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chromatid separation is accompanied by cohesin break- 
down, the action of motor proteins, and depolymerization 
of microtubules. Cytokinesis takes place at the end of 
telophase II. There are, however, only a haploid number 
of chromosomes present in each cell during meiosis II. 
Four genetically distinct haploid cells, each carrying one 
chromosome that represents each homologous pair, are 
the products of meiosis II. 


The Mechanistic Basis of Mendelian Ratios 


The separation of homologous chromosomes and sister 
chromatids in meiosis constitutes the mechanical basis 
of Mendel’s laws of segregation and independent assort- 
ment. The connection between meiosis and Mendelian 
hereditary principles was first suggested, independently, 
by Walter Sutton and Theodor Boveri in 1903. Based on 
microscopic observations of chromosomes during meio- 
sis, Sutton and Boveri proposed two important ideas. First, 
meiosis was the process generating Mendel’s rules of he- 
redity; and second, genes were located on chromosomes. 
Over the next 2 decades, work on numerous species 
proved these hypotheses to be correct. 

We can understand segregation by following a 
pair of homologous chromosomes through meiosis in 
a heterozygous organism. The organism in Figure 3.14, 
for example, has the Aa heterozygous genotype. DNA 
replication in S phase creates identical sister chromatids 
for each chromosome. At metaphase I, the homologs 
align on opposite sides of the metaphase plate; and at 
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Figure 3.14 Meiosis and the law of segregation. 
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anaphase I, the homologs separate from one another. 
This movement segregates the chromosome composed 
of two A-bearing chromatids from the chromosome 
bearing the two a-containing chromatids. Following 
these cells through to the separation of sister chroma- 
tids in meiosis II, we find that among the four gametes 
are two containing the A allele and two containing a. 
This outcome explains the 1:1 ratio of alleles that the 
law of segregation predicts for gametes of a heterozy- 
gous organism. 

The independent assortment of alleles is illus- 
trated by the behavior of two pairs of homologs during 
meiotic division in an organism, as demonstrated in 
Figure 3.15 using the AaBb dihybrid genotype. Once 
again, S phase creates two identical sister chromatids 
for each chromosome. In metaphase I, however, two 
equally likely arrangements of the two homologous 


Interphase 


Prophase | 


Metaphase | Arrangement I 


pairs can occur. In each arrangement, the homologous 
chromosomes are on opposite sides of the metaphase 
plate. Obviously, when a cell undergoes meiosis, only 
one or the other of these alternative arrangements will 
occur; thus, each cell undergoing metaphase I of meiosis 
will have either “arrangement |” or “arrangement II.” 
Over a large number of meiotic divisions, arrangement I 
and arrangement II are equally frequent. Arrangement I 
has chromosomes carrying dominant alleles on one 
side of the metaphase plate, and chromosomes carrying 
recessives on the opposite side. Arrangement II has a 
dominant-bearing and a recessive-bearing chromosome 
on each side of the metaphase plate. The first meiotic 
division segregates A from a and B from b to create the 
haploid products of meiosis I division. 

If we now follow each haploid product of meiosis I 
through the meiosis II division, we see that the four 
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Figure 3.15 Meiosis and the law of independent assortment. 
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gametes produced by arrangement I have the genotypes 
AB and ab in equal frequency. In contrast, the four gam- 
etes produced by arrangement II have the genotypes Ab 
and aB in equal frequency. Taking both possible arrange- 
ments of homologous chromosomes at metaphase I into 
account, eight gametes are generated with four equally 
frequent genotypes. Each of the gamete genotypes—AB, 
Ab, aB, and ab—is produced in a frequency of 25%. The 
result of a large number of meiotic divisions in an AaBb 
dihybrid is a 1:1:1:1 ratio among gametes, as expected by 
Mendel’s law of independent assortment. 


Segregation in Single-Celled Diploids 


We have seen that in sexually reproducing plants and 
animals, (1) the segregation of alleles can be explained by 
the disjunction of homologous chromosomes in meiosis I, 
and (2) independent assortment results from the different 
combinations of alleles to be found among the many gam- 
etes produced by an organism. Direct support of these 
conclusions is observed in the sexual reproduction of 
single-celled organisms such as yeast, which form diploid 
genomes for the purpose of sexual reproduction. 

The yeast species Saccharomyces cerevisiae (also 
known as baker’s yeast) can live and reproduce as a hap- 
loid but that can also form a diploid genome and produce 
gametes. Meiosis in S. cerevisiae produces four haploid 
gametes, called spores, that are contained in a sac-like 
structure called an ascus. The spores can be removed 
from the ascus and grown individually to reveal the alleles 
they contain. 

S. cerevisiae, like all yeast, can reproduce by either 
sexual or asexual means. Asexual reproduction takes 
place in haploid cells by a process called budding, in 
which a haploid daughter cell grows out of the progenitor 
(parental) cell. Following DNA replication, sister chro- 
matids separate and move into separate nuclei. One nu- 
cleus moves into a small bud that forms the daughter cell 
and is pinched off from the progenitor cell by cytokinesis. 
The newly formed bud has the same haploid genotype as 
its progenitor cell. 

Sexual reproduction in S. cerevisiae is induced by star- 
vation conditions and involves the union of two haploid 
yeast cells that are of different mating types. The mating 
types, called MATa and MATa, result from a difference in 
gene expression. Only the cross MATa X MATa produces 
a diploid strain, and meiosis in diploids produces the ascus 
containing four gametes. 

To demonstrate these events, let us look at a vis- 
ible marker of allelic variation in yeast (Figure 3.16). The 
wild-type allele (ADE*) for synthesis of the nucleotide 
base adenine leads to the growth of a white yeast colony. 
In contrast, mutant alleles (ade) that partially block 
adenine synthesis produce the growth of red-colored 
colonies. The red color appears in ade’ mutants due to 
the buildup of an intermediate product in the adenine 
synthesis pathway. 


When the haploid cross MATa ADE * X MATa ade~ 
is made, the resulting diploid has the heterozygous geno- 
type ADE */ade~. Meiosis in this heterozygous strain pro- 
duces an ascus containing four haploid spores that can be 
separated and grown independently to form colonies. The 
plate illustrated in Figure 3.16 shows two red yeast colo- 
nies and two white colonies, directly illustrating the 1:1 
ratio expected for allelic segregation during meiosis in the 
heterozygous organism. 

Genetic Analysis 3.1 gives you practice identifying 
the principles of Mendelian transmission in meiotic cell 
division. 


3.3 The Chromosome Theory of 
Heredity Proposes That Genes Are 
Carried on Chromosomes 


The early 20th century was a time of rapid expansion of 
genetic knowledge, fueled in large part by the rediscov- 
ery of Mendel’s hereditary principles in 1900 and, to a 
somewhat lesser extent, by Sutton and Boveri’s proposal 
that chromosome behavior in meiosis mirrors hereditary 
transmission of genes. Biologists were hard at work testing 
the new “gene hypotheses” of segregation and indepen- 
dent assortment in an array of organisms. 

Thomas Hunt Morgan, initially skeptical of the gene 
hypothesis, began working on the tiny fruit fly Drosophila 
melanogaster. Morgan intended to rigorously test 
Mendel’s rules in a natural species, not a domesticated 
one like Pisum sativum. Unlike Mendel, however, Morgan 
had no readily available phenotypic variants to examine. 
So, he and his students set out from their laboratory at 
Columbia University in New York City to the then-rural 
landscape of Long Island to attract fruit flies by hanging 
buckets of rotting fruit on trees. Once captured and trans- 
ported back to the laboratory, the flies were examined un- 
der the microscope to identify phenotypic variants. Flies 
captured from the wild were almost invariably of the same 
phenotype for each trait examined, and Morgan’s group 
referred to these phenotypes as the “wild type.” We use 
the term wild-type today to signify the phenotype that is 
the most common in a population. 

Morgan found Drosophila an easy organism to main- 
tain and reproduce in small glass bottles filled with a 
semisolid mixture of cornmeal, sugar, and water. The life 
cycle of Drosophila is between 12 and 14 days depend- 
ing on growth conditions, so 25 to 30 generations could 
be raised in a year. Morgan took advantage of this rapid 
reproduction to raise large numbers of flies over many 
generations, searching for occasional de novo (i.e., newly 
occurring) mutant phenotypes in his laboratory-reared 
populations and also in flies captured in the wild. Over 
several years, he found many phenotypic variants that 
he used for performing and analyzing controlled genetic 
crosses between selected male and female fruit flies. 
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Figure 3.16 Direct observation of the chromosomal basis of allelic segregation in the haploid-diploid life 
cycle of yeast. 
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GENETIC ANALYSIS 


PROBLEM A diploid organism has the genotype D,D2F,E>. Gene D and gene E are on different chro- BREAK IT DOWN: This organism is a 
mosomes. In the diagrams requested, illustrate only these two pairs of chromosomes and label each dihybrid (heterozygous for two genes). A total 


f h allel h d sister ch tid of four chromosomes—two homologous 
SPY OL Eac eee eee pairs—must be illustrated (p. 81). 


a. Diagram any correct mitotic metaphase, illustrate these two pairs of chromosomes, and label the 


alleles. BREAK IT DOWN: There is more than one 


b. Diagram any correct meiotic metaphase |, illustrate these two pairs of chromosomes, and label the | correct way to answer this and other questions posed 
in this problem. Follow the rules of segregation and 


alleles. independent assortment (p. 81). 
c. Describe the differences between the diagrams with respect to homolog and chromosome 

alignment. 
d. Compare the outcome of mitosis with the outcome of meiosis in terms of the number of chro- 

mosomes and the genotype of the cells produced. BREAK IT DOWN: Figures 3.6and3.9 


provide overviews of mitosis and meiosis in terms 
of chromosome division (p. 71 and 74). 


Solution Strategies Solution Steps 
Evaluate 
1. Identify the topic of this problem and 1. This problem concerns comparisons of mitosis and meiosis. Parts (a) and (b) 
the kind of information the answer require illustration of chromosome alignments at metaphase in mitosis and in 
should contain. meiosis l. Part (c) requires an explanation of the differences in those alignments, 
and part (d) requires comparison of the outcomes of mitosis and meiosis. 
2. Identify the critical information given 2. The organism is identified as a dihybrid for a pair of autosomal genes on 
in the problem. different chromosomes. 
TIP: Heterozygous organisms carry different alleles on homologous 
Deduce aoe but the alleles on sister chromatids are identical. | 


3. DNA duplicates in S phase. Identify J 3. Sister chromatids carry identical alleles as a result of DNA replication in S phase. 
the distribution of the different alleles Thus, for example, sister chromatids of a single chromosome each carry a copy 


on homologous chromosomes follow- of D1. Likewise, identical alleles are carried on each set of sister chromatids. 
ing completion of S phase. 

4. Review the overall patterns of 4. During mitotic metaphase, chromosomes align in single file and in an arbitrary 
chromosome alignment along the order along the metaphase plate. In meiotic metaphase |, homologs align oppo- 
metaphase plate during mitotic and site one another along the metaphase plate. 


meiotic divisions. 


Solve Answer a 
5. Diagram chromosome alignment 5. Any order of the four chromosomes in 
during mitotic metaphase. single file along the metaphase plate is 


a correct order. One example is shown. 


Answer b 
6. Diagram any correct chromosome 6. Homologous chromosomes align 
alignment during meiotic opposite one another along the meta- 
metaphase I. phase plate in meiotic metaphase |. The 


two correct arrangements of order of 
homologous chromosomes are shown. 


Answer c 

= Describe the diagram differences 7. Homologous chromosomes synapse in meiosis, but not in mitosis. The conse- 
with respect to homologs. quence of synapsis is that homologs align next to one another and on opposite 
sides of the metaphase plate in metaphase I. The absence of synapsis in mitosis 
leads chromosomes to align in any order along the metaphase plate in mitotic 
metaphase. 

Answer d 

8. Mitosis produces two diploid daughter cells that are genetically identical to one 
another and to the parental cell they are derived from. Meiosis produces four 
haploid daughter cells that are genetically different. 


For more practice, see Problems 1, 5, and 32. Visit the Study Area to access study tools. MasteringGenetics™ 


8. Describe the different outcomes of 
mitosis and meiosis. 
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X-Linked Inheritance 


While Sutton and Boveri were observing chromosome 
movements during meiosis, a researcher named Nettie 
Stevens was beginning a microscopic study to determine 
whether differences in chromosomes were evident be- 
tween males and females of a species of beetles, Tenebrio 
molitor. In T. molitor, Stevens found that diploid cells of fe- 
male beetles contained 20 large chromosomes, but diploid 
cells of males contained only 19 large chromosomes and 1 
small chromosome. When examining the chromosomes in 
T. molitor eggs and sperm, Stevens observed that all eggs 
contain 10 large chromosomes. Her examination of sperm, 
however, showed that about half the sperm she examined 
contained 10 large chromosomes while the other half con- 
tained 9 large chromosomes and 1 small chromosome. 

Stevens went on to study the chromosomes in somatic 
cells and gametes of other insects, and she concluded 
that sex-dependent hereditary differences are due to 
the presence of two large X chromosomes in females 
and one X chromosome and a much smaller Y chro- 
mosome in males. Sex-linked inheritance refers to the 
hereditary transmission of genes on the sex chromo- 
somes. Stevens proposed that sex chromosomes in ova of 
T. molitor are always of the same type—each ovum con- 
tains a copy of every autosomal chromosome and one X 
chromosome. On the other hand, sperm can carry one 
copy of every autosome and either an X chromosome or 
a Y chromosome. Stevens suggested that the presence of 
either an X or a Y chromosome in sperm determines the 
sex of offspring and that the equal frequency of X- and 
Y-bearing sperm accounts for the equal proportions of 
male and female offspring seen in crosses. Stevens was one 
of the first biologists to examine the transmission of sex- 
linked traits, and her studies of T. molitor were the first to 
propose a chromosomal basis for sex determination. 

In 1910, Thomas Hunt Morgan began a series of ex- 
periments in Drosophila that would validate Stevens’s pro- 
posal that X and Y chromosomes help determine sex and 
would also provide evidence suggesting that genes are car- 
ried on chromosomes. The experiments began when Lilian 
Morgan, Thomas Hunt Morgan’s wife and an important 
contributor to the laboratory group, found a mutant male 
Drosophila with white eyes in a bottle of wild-type flies that 
had been maintained in the lab for about a year. This white- 
eyed male stood out as a mutant because in Drosophila, 
wild-type flies have eyes the color of red bricks (Figure 3.17). 

The mutant white-eyed male was crossed to a wild-type, 
red-eyed female. The cross produced 1237 F; flies, all with 
red eyes—a result indicating dominance of the wild type over 
the mutant. Subsequently, the F, were crossed to one an- 
other to produce an F, that were expected to have a 3:1 ratio 
of red eyes to white eyes. Among the Fy were 2459 red-eyed 
females, 1011 red-eyed males, and 782 white-eyed males 
(Cross A in Figure 3.18). No white-eyed females appeared 
in the F>. Clearly, the F, result differed significantly from 
expectation, and white eyes seemed to be linked to male sex. 


Figure 3.17 X-linked eye-color phenotypes in Drosophila 
melanogaster. Red eyes (left) are produced by a dominant 
wild-type allele. White eyes (right) are produced by a recessive 


mutant allele. 
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Figure 3.18 Two reciprocal Drosophila crosses performed 
by Morgan to determine X-linkage of the gene for eye 

color. (a) Cross A determines that all F4 flies and all female 

F> flies have red (wild-type) eye color. One-half of F} males 
have red eyes and one-half have white eyes. (b) Cross B is the 
reciprocal of Cross A, producing a different result in the F4 

and F, generations. 


3.3 The Chromosome Theory of Heredity Proposes That Genes Are Carried on Chromosomes 85 


The unexpected result from this cross prompted a 
closer look at transmission of white eyes to a white-eyed 
female with a wild-type, red-eyed male. The F; of the re- 
ciprocal cross were red-eyed females and white-eyed males 
(Cross B in Figure 3.18). The F, contained equal propor- 
tions of red-eyed and white-eyed males and females. 

Diagrams of the crosses in Figure 3.18 are illus- 
trated in Figure 3.19, where w represents the recessive 
allele for white eye and w” the dominant allele for red 
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Figure 3.19 The X-linked genetic model of Morgan’s eye-color 
inheritance experiments in Drosophila. X and Y chromosome 
segregation in (a) Cross A and (b) Cross B from Figure 3.18. 


eye. The differences between reciprocal crosses ob- 
served by Morgan are not anticipated by Mendel’s laws 
of heredity. In fact, recall that Mendel performed many 
reciprocal crosses and found no differences in the phe- 
notype proportions. Morgan realized that transmission 
of X chromosomes in Drosophila could account for the 
appearance of white and red eyes in his crosses if the X 
chromosome carried a gene for eye color. In Cross A, 
the single X chromosome of a white-eyed male carries 
a recessive allele designated w. The X chromosome is 
present along with a Y chromosome in the genome of 
the male fruit fly. X chromosomes of females each carry 
a dominant allele w* that produces red eye color. The F; 
of this cross are red-eyed males that are w*Y and red- 
eyed females that are ww, The F, of this cross contain 
equal proportions of white-eyed (wY) and red-eyed 
(w*Y) males and red-eyed females that are, in equal 
proportions, w*w* and wtw. Cross B between a white- 
eyed female and a red-eyed male produces red-eyed 
female and white-eyed male F; progeny as well as equal 
proportions of red- and white-eyed males and females 
in the Fy. 

Morgan’s analysis of these experiments describes 
X-linked inheritance, a term identifying the transmis- 
sion of genes carried on the X chromosome. Morgan 
proposed X-linked inheritance as the mode of transmis- 
sion of eye color in Drosophila. Morgan’s X-linked in- 
heritance hypothesis requires some new terminology in 
reference to male genotypes for X-linked genes. We use 
the term hemizygous, a word meaning “half zygous,” to 
refer to male genotypes for X-linked genes. This term 
is used because males have a single X chromosome; 
therefore, unlike females, males cannot be homozygous 
or heterozygous for X-linked genes. Hemizygous males 
inherit their X chromosome from their mother; more- 
over, they express any allele on their X chromosome, 
since the Y chromosome does not carry genes that are 
homologous to those on the X chromosome. In contrast 
to males, females have two X chromosomes and can 
display heterozygous and homozygous genotypes for 
X-linked genes, just as they can for autosomal genes. 
Note also that males can transmit either the X chromo- 
some or the Y chromosome, but that the X chromo- 
some is passed exclusively to female progeny and the Y 
chromosome exclusively to male progeny. In contrast, 
females can transmit either X chromosome to any of 
their offspring. 


Testing the Chromosome Theory of Heredity 


Morgan’s observations on the inheritance pattern of 
Drosophila eye color led him to propose the chromosome 
theory of heredity, hypothesizing that genes are carried 
on chromosomes. Calvin Bridges, a student of Morgan, 
studied fruit flies with unexpected eye-color phenotypes 
and abnormal chromosome numbers and provided proof 
of the chromosome theory of heredity. 
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Bridges focused his study on Cross B (see Figures 
3.18 and 3.19), between a white-eyed female (ww) and a 
red-eyed male (w*Y). Nearly all the progeny from this 
cross had the expected phenotype and were either red- 
eyed females (w*w) or white-eyed males (wY), but about 1 
in every 2000 F; flies had an “exceptional phenotype”—a 
term used to identify progeny with unexpected charac- 
teristics. Specifically, the exceptional flies were either 
white-eyed females or red-eyed males. Bridges’s detec- 
tion of exceptional progeny left him with two questions 
to answer: (1) how could the exceptional progeny be 
explained, and (2) did the appearance of exceptional 
progeny provide the information necessary to test the 
hypothesis that genes are on chromosomes? 

The answer to the first question came when Bridges 
looked at chromosomes of the exceptional progeny un- 
der the microscope. He saw the exceptional females had 
three sex chromosomes—two X chromosomes and one 
Y chromosome (XXY) (Figure 3.20). As we discuss in 
the next section, fruit flies with two X chromosomes are 
females, even if there happens to be a Y chromosome as 
well, as there is in this case. Bridges also observed an ab- 
normal number of chromosomes in exceptional males. 
They carried a single X chromosome but no Y chro- 
mosome (XO). Fruit flies with one X chromosome are 
male, regardless of whether they carry a Y chromosome. 
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Figure 3.20 Exceptional progeny observed by Calvin 
Bridges result from X-chromosome nondisjunction during 
female meiosis. 


Based on his observations, Bridges proposed that the 
Y chromosome carried by exceptional females came 
from the male parent, the only source of a Y chromo- 
some in the cross, and that both X chromosomes in 
these exceptional females came from the mother, giving 
the exceptional females two copies of the w allele and 
white eye color. Bridges used similar logic to suggest 
that the single X chromosome in exceptional males 
came from the male parent that passed the w” allele. 
The exceptional males with a single X chromosome ex- 
pressed the w” allele as red eyes. 

According to Bridges’s proposal, the exceptional phe- 
notypes and abnormal numbers of chromosomes were the 
result of rare mistakes in meiosis caused by the failure of 
X chromosomes to separate properly in either the first or 
second meiotic division in females. Failed chromosome 
separation is called nondisjunction. Notice in Figure 3.20 
that nondisjunction also produces XXX or YO progeny. 
Bridges never saw these progeny, however, because YO 
progeny fail to develop, and XXX is usually lethal. Bridges’s 
observations provide conclusive proof of the chromosome 
theory of heredity by showing that the white (w) allele 
segregates with the X chromosome during normal meiosis 
and during nondisjunction. Genetic Analysis 3.2 gives you 
some practice spotting X-linked inheritance. 


3.4 Sex Determination Is 
Chromosomal and Genetic 


The term sex determination encompasses the genetic 
and biological processes that produce the male and female 
characteristics of a species. The sex of most organisms is 
identified on two levels: chromosomal sex, the presence 
of sex chromosomes associated with male and female sex 
in a species; and phenotypic sex, the internal and external 
morphology found in each sex. Chromosomal sex is de- 
termined at the moment of fertilization and is controlled 
by the sex chromosome contributed by the heteroga- 
metic parent. In contrast, phenotypic sex is a matter of 
appropriate gene expression and the development of sex 
characteristics during gestation or growth. In this section, 
we examine the patterns and processes of chromosomal 
and phenotypic sex determination in several organisms. 


Sex Determination in Drosophila 


Bridges’s study of X-chromosome nondisjunction and his 
proof of the chromosome theory of heredity also provided 
information about sex determination in Drosophila. In 
Drosophila, the number of X chromosomes and their rela- 
tion to the number of haploid sets of autosomal chromo- 
somes are a critical component in determining sex, and 
the number of Y chromosomes, or even the absence of a 
second sex chromosome, seems not to disrupt the pattern 
of sex determination. Thus, in Drosophila, flies with the 


GENETIC ANALYSIS 


PROBLEM A female fruit fly from a pure-breeding stock with yellow body color and full wing BREAK IT DOWN: Pure-breeding females 
size is crossed to a male from a pure-breeding stock with gray body and vestigial wings. The and males are homozygous for autosomal alleles. 

i ee ee Ln ee ee Pure-breeding females are homozygous for X-linked 
cross progeny consists of males with yellow body color and full-sized wings and females with alleles, but males are hemizygous (pp. 89 and 89). 


gray body color and full-sized wings. 

ee e a eT a EA > BREAK IT DOWN:AII male and female progeny 

a. Determine the mode of inheritance of each trait. have full-sized wings, but they differ in body color, 

b. Give genotypes for parental flies and the maleand (suggesting possible sex-linkage for that trait (p. 87). 
female progeny using clearly defined allele designations of your choice. 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic of this problem and 1. The patterns of transmission of two Drosophila traits and the genotypes of 
the kind of information the answer organisms are to be determined based on the number and proportions of 
should contain. male and female F, progeny with the traits. 
2. Identify the critical information 2. Pure-breeding parental phenotypes are given along with the phenotypes of 
given in the problem. male and female progeny in the F4. 
Deduce 
3. Consider the F4 phenotype results in 3. All F, progeny have full-sized wings and none have vestigial wings, suggesting 
light of the parental phenotypes. that full-sized wing is dominant. The F, males are exclusively yellow-bodied, 
TIP: Cross results that appear whereas F; females are exclusively gray-bodied. The F; male body color is 
equally in both sexes are consistent | identical to that of the parental female, whereas the F, females’ body color is 
with autosomal inheritance. Sex- ; ; ; 
dependent differences în a cross identical to that in the male parent. 
suggest sex-linked inheritance. 
4. Hypothesize the modes of inheritance 4. The observation of one body color in F} males and another in females suggests 
of body color and wing form from the this is an X-linked trait. Since hemizygous males have yellow body and females 
F; data. have gray body, it is likely that gray body is dominant and yellow body is reces- 


sive. The F, results for wing form are the same for both sexes, suggesting that 
this trait is autosomal. 


inheritance by comparing the predicted 


TIP: Test the hypothesized mode of 


and observed F, progeny ratios. 


Solve — Answer a 
5. Test the proposed mode of 5. The F; of both sexes have full-sized wings, consistent with an autosomal trait. 
transmission of wing form. The pure-breeding full-winged parent transmits the dominant alleles to all 
progeny, and the pure-breeding vestigial parent transmits the recessive allele. 
The F; are predicted to be heterozygous and display the dominant trait. 
6. Test the mode of transmission of 6. The sex-dependent difference in body color among F, males and females 
body color. strongly suggests this trait is X-linked. The F, males inherit the maternal reces- 
TIP: Compare observed and expected sive allele for yellow body color and express the trait because they are hemizy- 
F, progeny to test the hypothesized gous. F, females inherit a recessive allele on the maternal X chromosome and 
mode orianeritance; a dominant allele on the paternal X and are heterozygous, thus displaying the 
dominant phenotype. 
Answer b 
7. Determine genotypes for parental 7. The genotypes of pure-breeding parents are XY/XY; v/v" for yellow-bodied, 
and F4 flies. Use X” for yellow body, full-winged females and X”*/Y; v/v for gray-bodied, vestigial-winged males. 
X for gray body, v? for full wing, and v The F, females are X”/X”*; v*/v and F4 males are X/Y; v*/v. 


for vestigial wing. 


PITFALL: Remember that males are 
hemizygous for X-linked traits. Giving their 
genotype as homozygous or heterozygous 

is incorrect. 


For more practice, see Problems 12, 15, and 25. Visit the Study Area to access study tools. MasteringGenetics™ 
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sex-chromosome constitutions XY, XYY, and XO are all 
male, whereas flies that are XX or XXY are female. 

Bridges’s Drosophila data identified the ratio of X 
chromosomes to the number of haploid sets of auto- 
somes as 1X:2A in males and as 2X:2A in females. Bridges 
called this the X/A ratio, or the X/autosome ratio. In re- 
ality, the X/A ratio is too simplistic to explain Drosophila 
sex determination. Drosophila sex is determined by regu- 
latory proteins that relay the number of X chromosomes 
present in nuclei of cells in Drosophila embryos. These 
proteins control expression of the sex-lethal (Sxl) gene 
in XX flies. As we discuss in the Case Study at the end 
of Chapter 8, Sxl protein controls the expression of ad- 
ditional genes that drive sex development. 


Mammalian Sex Determination 


Like Drosophila, placental mammals have two kinds of sex 
chromosomes, identified as X and Y. Unlike Drosophila, 
however, sex determination in placental mammals 
depends on the presence or absence of the Y chromosome. 
A single gene on the Y chromosome, abbreviated SRY (sex- 
determining region of Y, and also known as the testis deter- 
mining factor), initiates a series of events that lead to male 
sex-phenotype development in the embryo. Consequently, 
mammalian embryos that have one or more Y chromo- 
somes (XY, XXY, and XYY, for example) and therefore 
express SRY will develop as males. Conversely, embryos 
carrying only X chromosomes (XX, XO, and XXX, for ex- 
ample) and lacking SRY expression will develop as females. 
SRY expression produces the transcription factor pro- 
tein testis-determining factor (TDF) that elicits a cascade 
of gene transcription and developmental events that ul- 
timately produce male internal and external structures. 
Early mammalian embryos contain twin clusters of tissue 
identified as undifferentiated gonads that can develop into 
either ovaries or testes. Connected to the undifferentiated 
gonads are two sets of tissues called the Wolffian ducts 
and the Müllerian ducts. The undifferentiated gonads de- 
velop, but just one of the ductal tissues develops. Wolffian 
ducts can develop to form male sexual and reproductive 
structures. Alternatively, Miillerian ducts can develop to 
form female sexual and reproductive structures. In male 
embryos, TDF initiates testicular development by stimu- 
lating interstitial cells in the gonadal tissue to synthe- 
size two male androgenic hormones, testosterone and 
dihydrotestosterone (DHT). These hormones help drive 
Wolffian duct development that leads to formation of 
internal and external male sexual and reproductive struc- 
tures. Separately, in specialized cells called sustentacular 
cells, TDF stimulates production of Miillerian-inhibitory 
factor (MIF) that degrades Miillerian ducts to prevent de- 
velopment of female sexual structures (Figure 3.21). 
Female embryos do not carry a Y chromosome and 
therefore lack production of TDF. The current model sug- 
gests that the absence of TDF suppresses the expression of 
genes that lead to male development and, instead, leads to 
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Figure 3.21 Mammalian sex determination is initiated by 


the Y-linked SRY gene. 


expression of genes that stimulate the undifferentiated go- 
nad tissue to develop into ovaries and cause Müllerian ducts 
to develop into female sexual and reproductive structures. 

While SRY is a necessary gene in mammalian sex 
development, it is not sufficient by itself to direct sexual 
development. For example, mutations of X-linked and 
autosomal genes mentioned in Experimental Insight 3.1 on 
page 89 have been identified as causes of abnormalities of 
human sexual development. 


Diversity of Sex Determination 


You are now familiar with the XX and XY chromosome 
designation signifying that females carry two X chromo- 
somes (XX) and males carry an X chromosome and a Y 
chromosome (XY). In many bird species, some reptiles, 
certain fish, and moths and butterflies, however, females 
carry two different sex chromosomes, and males carry two 
sex chromosomes that are the same. To avoid confusion 
with the XX/XY system, a different lettering system called 


Experimental Insight 3.1 


Mutations Altering Human Sex Development 


Many genes in addition to SRY direct human sexual develop- 
ment. Here we identify three other genes whose mutation 
affects the production or cell-signaling capacity of the male 
androgenic hormones testosterone and DHT (dihydrotes- 
tosterone) and results in abnormal sexual development. 
These conditions have different causes and distinctive con- 
sequences. From a medical perspective, ambiguous gender 
identification is a consequence of the conditions. In personal 
terms, significant psychosocial issues of self and of gender 
identity confront individuals with each of these conditions. 


ANDROGEN INSENSITIVITY SYNDROME (AIS) 


AIS (OMIM 300068) (see the Case Study in Chapter 2, p. 56, for 
a discussion of OMIM) is caused by mutations of the X-linked 
AR (androgen receptor) gene. AR is pivotal in producing an- 
drogen receptors on androgen-sensitive cells. AIS individuals 
are XY, have a fully functional SRY gene, and produce normal 
amounts of testosterone and DHT. In the absence of andro- 
gen receptors, however, testosterone and DHT cannot bind to 
cells, which therefore do not initiate the gene expression that 
accompanies male sexual development. Due to this deficit, 
individuals with AIS have an external phenotype that appears 
to be female (i.e. sex reversal); but internal reproductive 
structures do not develop as either male or female, thus ren- 
dering AIS individuals sterile. Androgen insensitivity prevents 
development of male sexual structures, whereas SRY-initiated 
MIF production degrades the Miullerian ducts and blocks the 
development of female sexual structures. 


PSEUDOHERMAPHRODITISM 

When genes operating in the biochemical pathway control- 
ling testosterone and DHT are mutated, improper androgen 
levels occur, and individuals can exhibit pseudohermaphro- 
ditism—a term referring to the appearance of nonfunctional 
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forms of both male and female structures in a single person. 
Pseudohermaphrodites are sterile. The autosomal recessive 
disorder 5-alpha-reductase deficiency (OMIM 607306) pro- 
duces a form of pseudohermaphroditism due to mutation of 
the steroid 5-alpha-reductase-2 (SRD5A2) gene. SRD5A2 pro- 
duces 5-alpha-reductase enzyme that helps convert testos- 
terone to DHT. Individuals with 5-alpha-reductase deficiency 
are XY, have a wild-type SRY gene, undergo Wolffian duct 
development, and express MIF. Wolffian duct development 
produces male internal structures, but the inability to con- 
vert testosterone to DHT results in the absence of external 
male structures. At birth, individuals with 5-alpha-reductase 
deficiency appear to be female. At puberty, however, the 
adrenal glands begin testosterone production that leads to 
secondary male sexual characteristics such as deepening of 
the voice, facial hair growth, and development of a mascu- 
line physique. 


CONGENITAL ADRENAL HYPERPLASIA (CAH) 

Mutation of CYP21, a gene producing the enzyme 
21-hydroxylase, causes the most common form of autoso- 
mal recessive congenital adrenal hyperplasia (CAH) (OMIM 
201910). Functional 21-hydroxylase participates in depletion 
of testosterone and DHT; thus, its mutation leads to accumu- 
lation of testosterone and DHT. CYP27 mutation produces 
pseudohermaphroditism in males and females due to high 
androgen levels. Boys with CAH enter puberty as early as 3 
years of age and display male musculature, enlarged penis, 
and testes growth. Girls with CAH are born with an enlarged 
clitoris that can be mistaken for a small penis. While normal 
internal female reproductive anatomy is present, CAH females 
experience male-like facial hair growth and deepening voice 
at puberty. Menstruation does not occur, due to excessive 
androgen levels. 
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the Z/W system is used in these cases. In the Z/W system, 
males are identified as having two Z sex chromosomes, or 
a sex chromosome composition of ZZ. In contrast, females 
have two different sex chromosomes and are identified as 
ZW. The letters Z and W are used to highlight the different 
sex-chromosome compositions associated with each sex. 
In such species, males are designated ZZ and females ZW. 
The sex-chromosome differences in the Z/W system 
produce different results from reciprocal crosses involv- 
ing Z-linked genes, just as there are reciprocal cross dif- 
ferences for X-linked genes. Figure 3.22 shows reciprocal 
crosses between pure-breeding hens (female) and roosters 
(male) involving a Z-linked dominant allele for barred 


(a) CrossA 


P 
F, All progeny 
are barred. 
Q ZW gzz 
(b) Cross B 
P 
F, 


QZ°wW 


Figure 3.22 ZW inheritance of feather form in poultry is 
revealed by analysis of reciprocal crosses. (a) A hemizygous 
female (hen) with recessive nonbarred (white) feathers crossed 
to a pure-breeding male (rooster) with dominant barred feath- 
ers produces F, progeny that are all barred. (b) The reciprocal 
cross produces barred roosters and nonbarred (white) hens. 


Hens are white and 
roosters are barred. 


feathers (Z) and its recessive counterpart, nonbarred 
feathers (Z’). The F; results of the reciprocal crosses re- 
veal differences consistent with sex-linked inheritance. 
Cross A produces barred hens (Z?W) and barred roosters 
(Z8z°) in the F,, whereas Cross B produces nonbarred 
hens (Z’W) and barred roosters (Z°Z”). The F, results of 
these crosses also yield differences consistent with sex- 
linked inheritance. We can conclude that the mechanism 
of transmission of Z-linked genes in the Z/W system is 
analogous to that in the XX/XY system except that the 
patterns are the reverse of those in placental mammals. 

Sex chromosome content is even more unusual in 
monotremes like the platypus, an egg-laying mammal that is 
native to Australia. Male platypus sex chromosomes are rep- 
resented as XyYjX9Y2X3Y3X4Y4Xs5Y5 and female platypus 
sex chromosomes as XX 1X2X2X3X3X4X4Xs5Xs5. Multiple 
sets of sex chromosomes have also been documented in 
some plant species, termites, and spiders. In dioecious plants 
(those with male plants and female plants), sex chromo- 
somes are often not obvious at all, and they are therefore 
difficult to study. And, in certain reptiles and fishes, sex is 
dependent on environmental variables such as temperature. 
In other words, the sex of an individual can change during its 
lifetime, even though its chromosomes do not. 


3.5 Human Sex-Linked Transmission 
Follows Distinct Patterns 


Sex chromosomes typically differ between males and 
females of a species and in most animal species, for ex- 
ample, females have two copies of the X chromosome and, 
therefore, two copies of each gene on the chromosome. In 
contrast, males typically have one X chromosome and one 
Y chromosome and, thus, just one copy of each X chromo- 
some gene and one copy of each Y chromosome gene. The 
inheritance of sex-linked mutant alleles on the X chromo- 
some produces mutant phenotypes in distinctive patterns. 
Two inheritance patterns of sex-linked genes are common. 
X-linked recessive inheritance is the hereditary pattern 
that determines white eye color in Drosophila. With this 
mode of inheritance, females homozygous for the reces- 
sive allele and hemizygous males whose X chromosome 
carries the recessive allele display the recessive phenotype. 
The alternative mode of X-linked transmission is X-linked 
dominant inheritance, in which heterozygous females and 
males hemizygous for the dominant allele express the 
dominant phenotype. 

Three features of X-linked dominant and X-linked re- 
cessive inheritance present a contrast to our description of 
inheritance of autosomal traits. First, autosomal dominant 
and recessive alleles generally have the same patterns in 
males and females, but when the traits are X-linked, the terms 
recessive and dominant refer specifically to their expression 
in females. For X-linked alleles, females can be homozygous 
or heterozygous, but males are hemizygous and express the 


Table 3.2 
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A Short List of Human X-Linked Dominant and X-Linked Recessive Traits 


Disease 


X-Linked Dominant Disorders 

Amelogenesis imperfecta (OMIM 301200) 
Congenital generalized hypertrichosis (OMIM 307150) 
Hypophosphatemia (OMIM 307800) 

Rett syndrome (OMIM 312750) 

X-Linked Recessive Disorders 

Anhidrotic ectodermal dysplasia (OMIM 305100) 
Color blindness (red-green) (OMIM 303800) 
Fragile X syndrome (OMIM 300624) 

Hemophilia A (OMIM 306700) 

Lesch-Nyhan syndrome (OMIM 300322) 


Muscular dystrophy (Becker type, OMIM 300376) and 
Duchenne type (OMIM 310200) 


Ornithine transcarbamylase deficiency (OMIM 311250) 


Symptom 


Abnormal tooth-enamel development and distribution 
Extensive hair distribution on the face and body 
Phosphate deficiency causing rickets (bowleggedness) 


Mental retardation and neurodevelopmental defects 


Absence of teeth, hair, and sweat glands 

Color-perception deficiency 

Mental retardation and neurodevelopmental defects 
Blood-clotting abnormality 

Mental retardation with self-mutilation and spastic cerebral palsy 


Progressive muscle weakness 


Mental deterioration due to ammonia accumulation with protein 


ingestion 


Retinitis pigmentosa (OMIM 300029) 


Night blindness, constricted visual field 


7 OMIM = Online Mendelian Inheritance of Man (see Chapter 2 Case Study for discussion). 


allele on their X chromosome, regardless of the hereditary 
pattern in females. Second, the probability of transmission of 
X-linked alleles to offspring is not the same for the two sexes 
as it is for autosomal alleles. Female X-linked transmission is 
identical to autosomal transmission, but hemizygous males 
always transmit their X chromosome to female offspring and 
their Y chromosome to male offspring. Lastly, whereas fe- 
males receive one copy of X-linked alleles from each parent, 
males receive their X-linked alleles from their mother and 
their Y-linked alleles from their father. 


Expression of X-Linked Recessive Traits 


X-linked recessive traits are expressed in hemizygous 
males who carry the recessive allele and in females who are 
homozygous for the recessive allele. Because hemizygous 


males express the single copy of a recessive X-linked al- 
lele in their phenotype, one of the hallmarks of X-linked 
recessive inheritance is the observation that many more 
males than females express the traits. Table 3.2 lists several 
X-linked disorders, including color blindness that affects 
perception of red and green color and hemophilia A, a 
blood-clotting disorder that we discuss in more detail just 
ahead. Four features characterizing X-linked recessive in- 
heritance are illustrated in Figure 3.23. 


1. Asaresult of male hemizygosity, more males than 
females have the recessive phenotype. There are 10 
recessive males and 2 recessive females. 


2. Ifa recessive male mates with a homozygous domi- 
nant female, all progeny have the dominant pheno- 
type. All female offspring are heterozygous carriers, 


Figure 3.23 An idealized example 


of X-linked recessive inheritance. 
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and all male offspring are hemizygous for the domi- 
nant allele. See the progeny resulting from the cross 
I-1 x 1-2. 

3. Matings of recessive males and carrier females 
produce the recessive phenotype in half the offspring 
and the dominant phenotype in the other half. See the 
results of the crosses III-13  III-4 and II-1 X III-2. 


4, Mating of a homozygous recessive female and a 
hemizygous dominant male produces male progeny 
with the recessive phenotype, and female offspring 
who have the dominant phenotype and are carriers 
of the recessive allele. See the results of the cross 
IV-5 x IV-6. 


Hemophilia A, a serious blood-clotting disorder, is 
caused by mutation of an X-linked gene called factor VIII 
(F8) that produces a blood-clotting protein called factor 
VIII protein. Hemophilia A is transmitted in an X-linked 
recessive manner, most often by a carrier mother who 
passes the mutant allele to an affected son. In typical 


X-linked recessive fashion, approximately half the sons 
of carrier mothers have the disease. In these families, 
the disease often appears to “skip” a generation because 
the mutant allele is passed from affected father to carrier 
daughter and on to an affected grandson. 

In some families, a de novo (newly occurring) mu- 
tation of the F8 gene is responsible for the appearance 
of hemophilia. An example occurred in the royal fami- 
lies of England and Europe: An apparent de novo muta- 
tion of the F8 gene affected Queen Victoria of England 
(Figure 3.24). Victoria had four sons, one of whom had 
hemophilia, along with five daughters, two of whom 
were known carriers. Victoria’s carrier daughters had 
normal blood clotting but introduced the mutation 
to the royal families of Russia, Germany, and Spain 
through intermarriage. These daughters passed the 
mutation to their sons who had hemophilia and to their 
daughters who were carriers like their mothers. Genetic 
Analysis 3.3 analyzes the hereditary transmission of 
hemophilia A. 
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Figure 3.24 Hemophilia in the royal families of Europe. Note that some parents are omitted from 
the pedigree for clarity. In all cases, these individuals carry and contribute wild-type alleles. 


GENETIC ANALYSIS 


PROBLEM Hemophilia A is an X-linked recessive blood-clotting disorder caused by mutation of the factor VIII gene. 
Suppose a heterozygous woman with normal blood clotting has children with a man who also has normal blood clotting. 
Determine the probability of each of the following outcomes. 


a. The probability of a son having hemophilia A. 


BREAK IT DOWN: The 
information given about the pattern 


BREAK IT DOWN: The woman can transmit the reces- 
sive allele to a child of either sex, but the man transmits his X- 


b. The probability of a child of either sex having normal blood clotting. (linked allele to daughters and his Y chromosome to sons (p.90). | | of inheritance of hemophilia A and 
feet eed eer ae š the status of the woman and the 
c. The probability of having three children, each of whom has hemophilia A. man allows identification of their 
iY n n is enotypes (p. 92). 
d. The probability of having four children, two of whom have hemophilia A and two of whom have normal ee 
blood clotting. 7 : 
BREAK IT DOWN: Parts (a) and (b) can be predicted using a 
Punnett square (p. 33); parts (c) and (d) are applications of binomial 
probability (p. 91). 


Solution Strategies Solution Steps 


Evaluate 

1. Identify the topic this problem ad- 1. This problem addresses inheritance probabilities of an X-linked recessive trait for 
dresses and describe the nature of the parental genotypes given. The answers should be stated as fraction, decimal, 
the required answers. or percentage probabilities. 

2. Identify the critical information 2. The inheritance pattern of the trait in question is identified as X-linked reces- 
given in the problem. sive, the phenotype of each parent is given, and the woman is identified as a 

heterozygote. 

Deduce 

3. Deduce the genotypes of the 3. The woman is identified as being hetero- xt y 
woman and the man. zygous and so her genotype is X"X", where 


TIP: Remember that males are the uppercase and lowercase superscripts 
hemizygous for X-linked traits. i i ls 
emizygous for X-linked traits. represent the dominant and recessive al y" X"X" xty 
leles, respectively. The man has normal 


blood clotting and is hemizygous for the Healthy Healthy 
wild-type allele. His genotype is XY. 


TIP: Use a Punnett square to assist 
you in accurately predicting the 
possible outcomes of mating. 


Hyh h 
4. Determine the possible phenotypes 4. The Punnett square predicts four different X” ae we 
and phenotype probabilities for genotypes among the possible children of Healthy Hemophilia A 
children of this couple. this couple. 
Solve Answer a 
5. Determine the probability of a child 5. From the Punnett square, we see that one of the four possible offspring geno- 
of this couple having hemophilia A. types is a male with hemophilia A. The probability of having a child with hemo- 
philia A is 0.25, or 25%. 
Answer b 
6. Determine the probability of a child 6. The Punnett square also shows that the remaining 3 in 4 possible offspring 
with normal blood clotting being pro- genotypes would produce normal blood clotting. The probability that a child of 
duced by this couple. this couple has normal blood clotting is 0.75, or 75%. 
Answer c 
7. Calculate the probability that if the 7. The risk that each child will have hemophilia A is 25%. For three children with 
couple has three children, each of hemophilia A, the probability is (.25)(.25)(.25) = 0.0156, or (+) (P) G) = &- 
them will have hemophilia A. 
> TIP: Use binomial probability to calculate the 
likelihood of consecutive outcomes. 
Answer d 
8. Calculate the probability that if the 8. The chance the couple has four children, two of whom have hemophilia A and 
couple has four children, two will two of whom are healthy, is predicted by the binomial expansion. There are six 
have hemophilia A and two will have different ways (birth orders) to produce two healthy and two affected children. 
normal blood clotting. The probabilities are 3⁄4 for a healthy child and % for a child with hemophilia A, 


so the requested probability is 6 | (3) (2) (+) (£)] = ($4), or 0.2109. 


For more practice, see Problems 12, 13, and 25. Visit the Study Area to access study tools. MasteringGenetics™ 
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X-Linked Dominant Trait Transmission 


Transmission of traits controlled by X-linked dominant 
alleles has three distinctive characteristics: 


1. Heterozygous females mated to wild-type males 
transmit the dominant allele to half their progeny of 
each sex. 


2. Because daughters receive their X chromosome from 
their father, dominant hemizygous males mated to 
homozygous recessive females transmit the domi- 
nant trait to all their daughters, but to none of their 
sons. 


3. Since just a single copy of the allele is necessary to 
produce the dominant phenotype, the dominant phe- 
notype is about equally frequent in males and females. 


Congenital generalized hypertrichosis (CGH) is a rare 
and dramatic X-linked dominant disorder in humans 
that displays each of these characteristics. The condition 
substantially increases the number of hair follicles on the 
body and produces much more body hair than normal, 
both in males and females (Figure 3.25a). Females with 
CGH have a recognizable phenotype, but face and body 
hair is less extensive and tends to be present in patches, 
for reasons we discuss later in the chapter. A partial 


pedigree of a family with CGH illustrates the transmission 
of the dominant alleles by the woman III-1 to about half 
her children and transmission of the allele by the man II-2 
to all his daughters but none of his sons (Figure 3.25b). 


Y-Linked Inheritance 


The Y chromosome is found only in males, and Y-linked 
genes are transmitted in a male-to-male pattern. In mam- 
mals, fewer than 50 genes are found on the Y chromo- 
some; and like SRY, those genes are likely to play a role 
in male sex determination or development. Many of the 
genes on the human Y chromosome have counterparts on 
the X chromosome, but they are located in regions that 
do not recombine with the X chromosome. Overall, only 
about 5% of the length of the Y chromosome is composed 
of pseudoautosomal regions, and recombination between 
X and Y is limited to these regions. 

Females never carry a Y chromosome; so from an 
evolutionary perspective, it makes sense that the genes 
carried on a Y chromosome should be male-specific, hav- 
ing either to do with male sex determination or reproduc- 
tion. Indeed, the most recent genomic evidence suggests 
that the mammalian Y chromosome has rapidly evolved 
over the past 300 million to 350 million years, undergoing 
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Figure 3.25 Congenital generalized hypertrichosis (CGH), an X-linked dominant trait in humans. 
(a) A boy with CGH. (b) A large family with CGH. In the single instance of transmission from an affected 
male (Il-2), notice that all daughters (Ill-5 to Ill-8) have CGH. The 6-year-old boy in panel (a) is IV-5. 
Some individuals have been omitted from the pedigree for clarity. 
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multiple changes in structure but preserving a handful of 
genes that are essential to male fertility and survival. The 
fascinating evolution of the mammalian Y chromosome 
is the subject of the Case Study at the end of this chapter. 


3.6 Dosage Compensation Equalizes 
the Expression of Sex-Linked Genes 


In organisms with sex chromosomes, there is an imbal- 
ance between the sexes in the copy number of genes on 
the sex chromosomes. In Drosophila and placental mam- 
mals, females have two copies of each X-linked gene, one 
on each X chromosome, whereas males have just a single 
copy of each X-linked gene. In animals, gene dosage bal- 
ance is essential for normal embryonic development and 
normal biological processes. Any mechanism that com- 
pensates for differences in the number of copies of genes 
due to the different chromosome constitutions of males 
and females is called dosage compensation. There are at 
least three dosage compensation mechanisms that equal- 
ize X-linked gene expression between male and female 
animals. Table 3.3 shows dosage compensation mecha- 
nisms in animals. In this section, we focus attention on 
dosage compensation in placental mammals. 

Placental mammals, including humans, use random 
X inactivation as their dosage compensation mechanism. 
Early in mammalian gestational development, about 2 
weeks after fertilization in humans, when the female early 
embryo consists of a few hundred cells, one of the two X 
chromosomes in each somatic cell of a female is randomly 
inactivated. This idea was first proposed in 1961 by Mary 
Lyon in her random X inactivation hypothesis, also 
known as the Lyon hypothesis. In approximately half the 
somatic cells in a female embryo, the maternally derived 
X chromosome is inactivated; and in the other half of so- 
matic cells, inactivation silences the paternally derived X 


chromosome. At the end of this process, each somatic cell 
of a female has one active X chromosome that is equally 
likely to be the maternal X or the paternal X. 

Random X inactivation takes place in every cell with 
two or more X chromosomes. Following inactivation, the 
inactive chromosome can be seen as a tightly condensed 
mass adhering to the nuclear wall. The inactive X chro- 
mosome is known as a Barr body, having first been visu- 
alized by Murray Barr in 1949. 

X inactivation is a permanent feature of somatic 
cells of placental mammalian females. Since some cells 
have an active maternal X chromosome and an inactive 
paternal X chromosome and other cells have the oppo- 
site pattern, normal placental mammalian females are, 
in terms of X chromosomes, a mosaic of two kinds of 
cells. One cell type (pink) expresses the maternally de- 
rived X chromosome, and the other (blue) expresses the 
paternally derived X chromosome (Figure 3.26). Each 
individual cell expresses the allelic information of only 
one of those chromosomes, with all descendant cells 
maintaining the same inactivation pattern as to original 
ancestral cell. 

In most cases, the silencing of one X chromosome in 
each cell of a female has no detectable effect on the func- 
tion of a tissue or on the phenotype. Occasionally, how- 
ever, female carriers of X-linked recessive traits display 
a phenotypic manifestation of the recessive allele. Calico 
and tortoiseshell coat-color patterning in female cats is a 
product of mosaicism created by random X inactivation 
(Figure 3.27). Females with an allele for black coat color on 
one X chromosome and yellow coat color on the homolo- 
gous X chromosome have black and yellow patches of fur 
corresponding to portions of skin where each X chromo- 
some is active. The sizes and the distribution of the orange 
and black sectors of these cats reflect the locations of the 
clonal descendants of the cells in which each X chromosome 
was originally inactivated. The specific pattern of X inactiva- 
tion is unique to each female cat embryo, and the patterns 


Table 3.3 Mechanisms of Dosage Compensation in Animals 


Dosage Compensation Mechanism 


Expression of X-linked genes in males is doubled relative to female 
X-linked gene expression. 


Gene expression of each X chromosome in the hermaphrodite 


(“female”) is decreased to one-half that of the X chromosome 
in the male. 


The paternally derived X chromosome is inactivated in all female 


somatic cells. 


Animal Sex Chromosomes 
Males Females 
Fruit fly XY XX 
Roundworm XO Xx? 
Marsupial mammals XY XX 
Placental mammals XY XX 


One X chromosome is randomly inactivated in each female 


somatic cell. 


a XX worms are hermaphrodites. 
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Figure 3.26 Random X inactivation in female placental 
mammals. M represents the maternally derived X chromosome 
and P the paternally derived X chromosome. 


of cellular migration are variable as well. As a result, each 
adult female calico or tortoiseshell cat has a unique pattern 
of black and orange sectors marking its coat. 

Not all genes on the “inactivated” X chromosome 
are transcriptionally silent. A 2005 study of 624 X-linked 
genes showed that about 15% of the genes escape com- 
plete silencing. On average, transcription of the X-linked 
genes that remain active is reduced by about 50-85% in 
comparison to transcription on the active X chromosome. 
The genes that escape inactivation are largely clustered on 
the short arm of the chromosome near PARI. 


Figure 3.27 Calico coat, produced by X inactivation in 
female cats. 


Random X inactivation requires a gene on the X chro- 
mosome called the X-inactivation—specific transcript 
(XIST) that encodes a large RNA molecule. XIST RNA 
spreads out from the gene, “painting” the X chromosome 
as it accumulates. X chromosomes that are painted with 
XIST RNA have all, or nearly all, of their genes silenced. 
The XJST RNA accumulates only on the one chromo- 
some transcribing the gene and does not spread to the ho- 
mologous X chromosome. In other words, XIST acts only 
in cis (on the same chromosome) but not in trans (on the 
homologous chromosome). Examination of inactivated 
chromosomes in the nucleus detects XIST RNA coating 
the Barr body in a nucleus. 


GASES TUDY 


The (Degenerative) Evolution of the Mammalian Y Chromosome 


Mammalian X and Y chromosomes are the “odd couple” of 
homologous chromosomes for several reasons. First, they 
are very different from each other in size. The human Y chro- 
mosome is less than one-third the size of the X chromosome. 
Second, they aren't really all that homologous. The human 
X chromosome contains several 2000-3000 genes, but the 
Y chromosome contains just a few dozen genes. Third, the 
small pseudoautosomal regions they share at their ends 
make up just a few percent of the total sequence of either 
chromosome. The pseudoautosomal regions are sufficient 
for synapsis in prophase |, and recombination between X 
and Y is frequent in these regions, but only about 5% of the 


Y chromosome participates in recombination. The other 95% 
of the chromosome experiences no crossing over. Finally, 
and perhaps most significantly, the mammalian Y chromo- 
some has evolved very rapidly over the past 300 million 
years or so, shrinking in size and genetic content as essential 
genes have been shifted to other chromosomes, leaving just 
a handful of genes behind. 


A STORY OF DEGENERATION Beginning with the work of 
Bruce Lahn and David Page in 1999, the composition and evo- 
lution of the mammalian Y chromosome have been subjects of 


active investigation. The view of Y chromosome evolution first 
proposed by Lahn and Page has been supported and verified 
by additional studies and by genome sequencing, and it tells 
the story of an evolutionary pathway that features progressive 
degeneration. 

In 1999, Lahn and Page studied the human X and Y 
chromosomes and identified 19 genes that are present on 
both chromosomes, called X-Y shared genes. These genes 
are left over from a time when the chromosomes were 
much more similar and regularly recombined. Lahn and 
Page reasoned that they could trace the evolution of the 
genes by studying differences between the DNA sequences 
of the X-Y shared genes—more differences accrue the lon- 
ger genes have been separated. What they found was quite 
surprising: The differences between the X-Y shared genes 
followed a distinct and suggestive pattern. X-Y shared 
genes nearest each other on the X chromosome short arm 
were most similar to their Y-chromosome counterparts, 
but X-Y shared genes on the long arm of the X chromo- 
some were the most different from their Y-chromosome 
counterparts. In all, Lahn and Page identified four well- 
defined “strata” among the X-Y shared genes, each stratum 
having its own distinct level of sequence similarity. Within 
each of the strata, the level of X-Y shared-gene similar- 
ity was remarkably consistent, but there were substantial 
differences in gene similarity between strata. This sug- 
gested four major evolutionary events that reshaped the Y 
chromosome, resulting in structural changes that progres- 
sively restricted recombination between the X and the Y 
chromosomes. 


MAJOR RESTRUCTURING EVENTS By comparing DNA 
sequences across species, Lahn and Page determined that 
the autosomal precursors of X and Y were very similar at the 
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time reptiles diverged from mammals, about 350 million 
years ago (mya). The monotremes (such as the platypus and 
echidna) separated from the placental mammals 240-320 
mya, but not before the SRY gene evolved in their common 
ancestor. Both monotremes and mammals have SRY, but rep- 
tiles do not. This implies that SRY developed about 350 mya 
(Figure 3.28). The SRY gene produces TDF, the protein that 
initiates a cascade of events that produces males. With the 
acquisition of SRY, the Y chromosome became different from 
the X chromosome, and the region surrounding SRY—the 
first of Lahn and Page's four strata—became the first region 
of the Y chromosome to be unable to recombine with the X 
chromosome. This event also contributed to the shrinkage of 
the Y chromosome. 

About 130-170 mya, a structural change altered the 
Y chromosome and produced a second stratum that was 
unable to recombine with the X chromosome. Marsupials 
(such as kangaroos) retain the old Y-chromosome structure, 
so the generation of the second stratum demarcates the 
separation of marsupial and placental mammals. Another 
structural change to the Y chromosome, between 80 and 
130 mya, created a third stratum of divergence, further re- 
stricting recombination with the X chromosome and shrink- 
ing the Y chromosome. This change marks the separation 
of the monkeys from nonsimian placental mammals. Most 
recently, about 30-50 mya, the fourth stratum was created 
by another structural change to the Y chromosome. This 
change—present in the human lineage that includes our 
great ape relatives but not present in monkeys—limited 
recombination to the end of the Y chromosome and re- 
duced its size. In humans, recombination between X and Y 
chromosomes is limited to PAR1, the largest of the remain- 
ing regions of X-Y homology. Little if any recombination 
occurs in PAR2. 
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Figure 3.28 The proposed evolutionary development of the mammalian Y chromosome through four major 


structural rearrangements. 


98 


The functioning of genes remaining on the Y chromo- 
some was directly affected by the events that prevented 
X-Y recombination. Without recombination, Y-linked 
genes were subject to mutational degradation that 
would eventually render them nonfunctional. To prevent 
this, strong natural selection operated to move essential 
genes off the Y chromosome to other chromosomes. The 
genes that remain on the human Y chromosome are al- 
most exclusively important in male development or sperm 


SUMMARY 


3.1 Mitosis Divides Somatic Cells 


The cell cycle has two principal phases: interphase, whose 
stages are G4, S, and Go; and M phase, during which cell 
division occurs. 

Mitosis is the process of division for somatic cells. Mitosis 
contains five substages: prophase, prometaphase, metaphase, 
anaphase, and telophase. 

Mitosis contains a single cell division and separates sister 
chromatids into diploid daughter cells that are genetically 
identical to one another and to the parental cell they are 
derived from. 

The cell cycle is under tight genetic control. Regulatory 
molecules control the transition from one stage of the cycle 
to the next by acting at genetically controlled checkpoints to 
monitor cell cycle transitions. 

Mutation of cell cycle control genes is associated with cancer 
development. 


3.2 Meiosis Produces Gametes for Sexual 
Reproduction 


Meiosis contains two cell divisions, designated meiosis I and 
meiosis II. 

During meiosis I (the “reduction division”), homologous 
chromosomes are separated to produce haploid daughter 
cells that carry one chromosome from each homologous pair 
of chromosomes. 

The meiosis II division separates sister chromatids and pro- 
duces four genetically different haploid daughter cells that 
form gametes. 

During prophase I, homologous chromosomes synapse with 
the aid of the synaptonemal complex. Homologous chromo- 
somes can cross over to exchange genetic material during 
this substage. 

Mendel’s laws of segregation and independent as- 
sortment find their mechanical basis in the patterns 

of separation of chromosomes and sister chromatids 
during meiosis. 
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production, but even these remain subject to mutational 
degradation. 

What will be the ultimate fate of the human Y chromo- 
some? Is it destined to be lost? Scientists don’t know what will 
happen, but recent genomic data may provide a clue. The Y 
chromosome, it seems, has backup copies of its genes. These 
duplicated copies are also on the Y chromosome, and they 
may serve to protect the Y chromosome from the loss of criti- 


((masteringGenetics” 


cal information. 


For activities, animations, and review quizzes, go to the Study Area. 


3.3 The Chromosome Theory of Heredity Proposes 
That Genes Are Carried on Chromosomes 


The chromosome theory of heredity proposes that genes 

are carried on chromosomes and are faithfully transmitted 
through gametes to successive generations. 

Thomas Hunt Morgan’s identification of X-linked transmis- 
sion of white eye color in Drosophila and Calvin Bridges’s anal- 
ysis of exceptional phenotypes produced by X-chromosome 
nondisjunction demonstrated the validity of the chromosome 
theory of heredity. 


3.4 Sex Determination Is Chromosomal and Genetic 


Mechanisms of sex determination take many forms in ani- 
mals. Drosophila sex is determined by the ratio of expression 
of X-linked and autosomal genes, whereas human sex is de- 
termined by the presence of SRY on the Y chromosome. 
Sex-chromosome patterns are diverse among organisms. Birds, 
fishes, and some insects have Z and W sex chromosomes, and 
monotremes have multiple sets of sex chromosomes. 


3.5 Human Sex-Linked Transmission Follows 
Distinct Patterns 


Human X-linked dominant inheritance and X-linked recessive 
inheritance are identifiable, respectively, by the pattern of male 
transmission and the pattern of male expression of traits. 
Genes on the Y chromosome are transmitted exclusively 
from male to male. 


3.6 Dosage Compensation Equalizes the 
Expression of Sex-Linked Genes 


Dosage compensation balances the level of expression of 
sex-linked genes and is critical for normal animal develop- 
ment. Mechanisms for achieving dosage compensation vary 
among species. 

Random inactivation of one X chromosome in each 

cell of placental mammalian females is controlled by 

an X-inactivation center on the X chromosome. 
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For answers to selected even-numbered problems, see Appendix: Answers. 


with diploid number 2” = 6, and identify what stage of M 
phase is represented. 


(a) (b) 


Our closest primate relative, the chimpanzee, has a diploid 
number of 2n = 48. For each of the following stages of M 
phase, identify the number of chromosomes present in 
each cell. 

a. end of mitotic telophase 

c. end of meiotic anaphase II 
e. mitotic metaphase 


b. meiotic metaphase I 
d. early mitotic prophase 
f. early prophase I 


1. Examine the following diagrams of cells from an organism 3: 


End of Cell Cycle Stage 
L Telophase | 


In a test of his chromosome theory of heredity, Morgan 
crossed an F; female Drosophila with red eyes to a male 
with white eyes. The F; females were produced from Cross 
A shown in Figure 3.19. Predict the offspring Morgan 
would have expected under his hypothesis that the gene 
for eye color is on the X chromosome in fruit flies. 


Tension between sister chromatids is essential to ensure 
their efficient separation at mitotic anaphase or in meiotic 
anaphase II. Explain why sister chromatid cohesion is im- 
portant, and discuss the role of the proteins cohesin and 
separase in sister chromatid separation. 


The diploid number of the hypothetical animal Geneticus 
introductus is 2n = 36. Each diploid nucleus contains 3 ng 
of DNA in G). 

a. What amount of DNA is contained in each nucleus at 
the end of S phase? 

b. Explain why a somatic cell of Geneticus introductus has 
the same number of chromosomes and the same amount 
of DNA at the beginning of mitotic prophase as one of 
these cells does at the beginning of prophase I of meiosis. 

c. Complete the following table by entering the number of 
chromosomes and amount of DNA present per cell at 
the end of each stage listed. 


Amount of 
DNA 


Number of 
Chromosomes 


Mitotic anaphase 


Telophase II 
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6. An organism has alleles R; and R3 on one pair of homolo- 
gous chromosomes, and it has alleles T} and T, on another 
pair. Diagram these pairs of homologs at the end of meta- 
phase I, the end of telophase I, and the end of telophase II, 
and show how meiosis in this organism produces gametes 
in expected Mendelian proportions. Assume no crossover 
between homologous chromosomes. 


7. Explain how the behavior of homologous chromosomes in 
meiosis parallels Mendel’s law of segregation for autosomal 
alleles D and d. During which stage of M phase do these 
two alleles segregate from one another? 


8. Suppose crossover occurs between the homologous chro- 
mosomes in the previous problem. At what stage of M 
phase do alleles D and d segregate? 


Application and Integration 


12. A woman’s father has ornithine transcarbamylase 
deficiency (OTD), an X-linked recessive disorder 
producing mental deterioration if not properly treated. The 
woman’s mother is homozygous for the wild-type allele. 


a. What is the woman’s genotype? (Use D to represent the 
dominant allele and d to represent the recessive allele.) 

b. Ifthe woman has a son with a normal man, what is the 
chance the son will have OTD? 

c. Ifthe woman has a daughter with a man who does not 
have OTD, what is the chance the daughter will be a 
heterozygous carrier of OTD? What is the chance the 
daughter will have OTD? 

d. Identify a male with whom the woman could produce a 
daughter with OTD. 

e. For the instance you identified in part (d), what propor- 
tion of daughters produced by the woman and the man 
are expected to have OTD? What proportion of sons of 
the woman and the man are expected to have OTD? 


13. In humans, hemophilia (OMIM 306700) is an X-linked 
recessive disorder that affects the gene for factor VIII pro- 
tein, which is essential for blood clotting. The dominant 
and recessive alleles for the factor VIII gene are represented 
by Hand h. Albinism is an autosomal recessive condition 
that results from mutation of the gene producing tyrosi- 
nase, an enzyme in the melanin synthesis pathway. A and 
a represent the tyrosinase alleles. A healthy woman named 
Clara (II-2), whose father (I-1) has hemophilia and whose 
brother (II-1) has albinism, is married to a healthy man 
named Charles (II-3), whose parents are healthy. Charles’s 
brother (II-5) has hemophilia, and his sister (II-4) has albi- 
nism. The pedigree is shown below. 


BL] Hemophilia 
Q Albinism 


E iOi TON 


il 1 2 3 Op "i 


Clara | Charles 


9. Alleles A and a are on one pair of autosomes, and 
alleles B and b are on a separate pair of autosomes. 
Does crossover between one pair of homologs affect 
the expected proportions of gamete genotypes? Why 
or why not? Does crossover between both pairs of 
chromosomes affect the expected gamete proportions? 
Why or why not? 


10. How many Barr bodies are found in a normal human 
female nucleus? In a normal male nucleus? 


11. Describe the role of the following structures or proteins in 
cell division: 
a. microtubules 
c. kinetochores 


b. cyclin-dependent kinases 
d. synaptonemal complex 


For answers to selected even-numbered problems, see Appendix: Answers. 


a. What are the genotypes of the four parents (I-1 to I-4) 
in this pedigree? 

b. Determine the probability that the first child of Clara 
and Charles will be a 
i. boy with hemophilia 
ii. girl with albinism 
iii. healthy girl 
iv. boy with both albinism and hemophilia 
v. boy with albinism 
vi. girl with hemophilia 

c. If Clara and Charles’s first child has albinism, what is 
the chance the second child has albinism? Explain why 
this probability is higher than the probability you calcu- 
lated in part (b). 


14. A wild-type male and a wild-type female Drosophila with 
red eyes and full wings are crossed. Their progeny are 


shown below. 
Males Females 
= g full wing, red eye 3 full wing, red eye 


2 miniature wing, red eye } purple eye, full wing 


3 purple eye, full wing 
l miniature wing, purple eye 


a. Using clearly defined allele symbols of your choice, give 
the genotype of each parent. 

b. What is/are the genotype(s) of females with purple eye? 
Of males with purple eye and miniature wing? 


15. A woman with severe discoloration of her tooth enamel has 
four children with a man who has normal tooth enamel. 
Two of the children, a boy (B) and a girl (G), have discolored 
enamel. Each has a mate with normal tooth enamel and pro- 
duces several children. G has six children, four boys and two 
girls. Two of her boys and one of her girls have discolored 
enamel. B has seven children, four girls and three boys. All 
four of his daughters have discolored enamel, but all his boys 
have normal enamel. Explain the inheritance of this condition. 


16. Ina large metropolitan hospital, cells from newborn babies 
are collected and examined microscopically over a 5-year 
period. Among approximately 7500 newborn males, six 
have one Barr body in the nuclei of their somatic cells. All 


17. 


18. 


19. 


other newborn males have no Barr bodies. Among 7500 
female infants, four have two Barr bodies in each nucleus, 
two have no Barr bodies, and the rest have one. What is the 
cause of the unusual number of Barr bodies in a small num- 
ber of male and female infants? 


In cats, tortoiseshell coat color appears in females. A tor- 
toiseshell coat has patches of dark brown fur and patches 
of orange fur that each in total cover about half the body 
but have a unique pattern in each female. Male cats can be 
either dark brown or orange, but a male cat with tortoise- 
shell coat is rarely produced. Two sample crosses between 
males and females from pure-breeding lines produced the 
tortoiseshell females shown. 


Cross I P: dark brown male X orange female 
Fj: orange males and tortoiseshell females 
Cross II P: orange male X dark brown female 


F,: dark brown males and tortoiseshell females 


a. Explain the inheritance of dark brown, orange, and tor- 
toiseshell coat colors in cats. 

b. Why are tortoiseshell cats female? 

c. The genetics service of a large veterinary hospital gets 
referrals for three or four male tortoiseshell cats every 
year. These cats are invariably sterile and have under- 
developed testes. How are these tortoiseshell male cats 
produced? Why do you think they are sterile? 


The gene causing Coffin-Lowry syndrome (OMIM 
303600) was recently identified and mapped on the hu- 
man X chromosome. Coffin-Lowry syndrome is a rare 
disorder affecting brain morphology and development. 
It also produces skeletal and growth abnormalities, as 
well as abnormalities of motor control. Coffin-Lowry 
syndrome affects males who inherit a mutation of the 
X-linked gene. Most carrier females show no symptoms 
of the disease but a few carriers do. These carrier females 
are always less severely affected than males. Offer an ex- 
planation for this finding. 


Four eye-color mutants in Drosophila—apricot, brown, 
carnation, and purple—are inherited as recessive traits. 
Red is the dominant wild-type color of fruit-fly eyes. Eight 
crosses (A to H) are made between parents from pure- 
breeding lines. 


Cross Parents F, Progeny 
Female Male Female Male 
A Apricot Red Red Apricot 
B Brown Red Red Red 
G Red Purple Red Red 
D Red Apricot Red Red 
Ẹ Carnation Red Red Carnation 
F Purple Red Red Red 
G Red Brown Red Red 
H Red E Carnation Red Red 


20. 


21. 
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a. Which of these eye-color mutants are X-linked reces- 
sive and which are autosomal recessive? Explain how 
you distinguish X-linked from autosomal heredity. 

b. Predict F phenotype ratios of Crosses A, B, D, and G. 


For each pedigree shown, 

a. Identify which simple pattern of hereditary trans- 
mission (autosomal dominant, autosomal recessive, 
X-linked dominant, or X-linked recessive) is most likely 
to have occurred. Give genotypes for individuals in- 
volved in transmitting the trait. 

b. Determine which other pattern(s) of transmission is/are 
possible. For each possible mode of transmission, spec- 
ify the genotypes necessary for transmission to occur. 

c. Identify which pattern(s) of transmission is/are impos- 
sible. Specify why transmission is impossible. 


Pedigree A O 
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Pedigree B Om 
O Of Oye O 
OO OMOOHO0O 
Pedigree C O 
Ò © Ò O 
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Pedigree D e 
LHO © C © EO 
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Use the blank pedigrees provided to depict transmission 
of (a) an X-linked recessive trait and (b) an X-linked domi- 
nant trait, by filling in circles and squares to represent 
individuals with the trait of interest. Give genotypes for 
each person in each pedigree. Carefully design each trans- 
mission pattern so that pedigree (a) cannot be confused 
with autosomal recessive transmission and pedigree (b) 
cannot be confused with autosomal dominant transmis- 
sion. Identify the transmission events that eliminate the 
possibility of autosomal transmission for each pedigree. 


(a) O 
HÒ Ò 
OO 
Ò 
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22. Figure 3.22 (page 90) illustrates reciprocal crosses involv- 
ing chickens with sex-linked dominant barred mutation. 
For Cross A and for Cross B, cross the F, roosters and 
hens and predict the feather patterns of roosters and 
hens in the F3. 


23. 


24. 


25. 


26. 


In fruit flies, yellow body (y) is recessive to gray body (y*), 
and the trait of body color is inherited on the X chromo- 
some. Vestigial wing (v) is recessive to full-sized wing (v*), 
and the trait has autosomal inheritance. A cross of a male 
with yellow body and full wings to a female with gray body 
and full wings is made. Based on an analysis of the prog- 
eny of the cross shown below, determine the genotypes of 
parental and progeny flies. 


Number Number of 
Phenotype of Males Females 
Yellow body, full wing 296 301 
Yellow body, vestigial wing 101 98 
Gray body, full wing 302 2988 
Gray body, vestigial wing 101 103 
800 800 


In a species of fish, a black spot on the dorsal fin is ob- 
served in males and females. A fish breeder carries out 
a pair of reciprocal crosses and observes the following 


results. 
CrossI Parents: black-spot male X nonspotted female 
Progeny: 22 black-spot males 
24 black-spot females 
25 nonspotted males 
21 nonspotted females 
Cross II Parents: nonspotted male X black-spot female 
Progeny: 45 black-spot males 
53 nonspotted females 
a. Why does this evidence support the hypothesis that a 


black spot is sex linked? 

Identify which sex is homogametic and which is hetero- 
gametic. Give genotypes for the parents in each cross, 
and explain the progeny proportions in each cross. 


Lesch-Nyhan syndrome (OMIM 300322) is a rare X-linked 
recessive disorder that produces severe mental retardation, 
spastic cerebral palsy, and self-mutilation. 


a. 


What is the probability that the first son of a woman 
whose brother has Lesch-Nyhan syndrome will be 
affected? 

If the first son of the woman described in (a) is affected, 
what is the probability that her second son is affected? 
What is the probability that the first son of a man 
whose brother has Lesch-Nyhan syndrome will be 
affected? 


In humans, SRY is located near a pseudoautosomal region 
(PAR) of the Y chromosome, a region of homology be- 
tween the X and Y chromosomes that allows them to syn- 
apse during meiosis in males and is a region of crossover 


27. 


28. 


29. 


between the chromosomes. The diagram below shows SRY 
in relation to the pseudoautosomal region. 


SRY 


About 1 in every 25,000 newborn infants is born with 
sex reversal; the infant is either an apparent male, but 
with two X chromosomes, or an apparent female, but 
with an X anda Y chromosome. Explain the origin of 
sex reversal in human males and females involving the 
SRY gene. (Hint: See Experimental Insight 3.1 for a clue 
about the mutational mechanism.) 


In an 1889 book titled Natural Inheritance (Macmillan, 
New York), Francis Galton, who investigated the inheri- 
tance of measurable (quantitative) traits, formulated a 
law of “ancestral inheritance.” The law stated that each 
person inherits approximately one-half of his or her 
genetic traits from each parent, about one-quarter of 

the traits from each grandparent, one-eighth from each 
great grandparent, and so on. In light of the chromosome 
theory of heredity, argue either in favor of Galton’s law 
or against it. 


Drosophila has a diploid chromosome number of 2 = 8, 
which includes one pair of sex chromosomes (XX in fe- 
males and XY in males) and three pairs of autosomes. 
Consider a Drosophila male that has a copy of the A; allele 
on its X chromosome (the Y chromosome is the homolog) 
and is heterozygous for alleles B; and By, C; and C9, and 
D; and D, of genes that are each on a different autoso- 
mal pair. In the diagrams requested below, indicate the 
alleles carried on each chromosome and sister chromatid. 
Assume that no crossover occurs between homologous 
chromosomes. 


a. What is the genotype of cells produced by mitotic 
division in this male? 

b. Diagram any correct alignment of chromosomes at 
mitotic metaphase. 

c. Diagram any correct alignment of chromosomes at 
metaphase I of meiosis. 

d. For the metaphase I alignment shown in (c), what 
gamete genotypes are produced at the end of 
meiosis? 

e. How many different metaphase I chromosome align- 
ments are possible in this male? How many genetically 
different gametes can this male produce? Explain your 
reasoning for each answer. 


A wild-type Drosophila male and a female with wild- 

type phenotype are crossed, producing 324 female 

progeny and 161 male progeny. All their progeny are 

wild type. 

a. Propose a genetic hypothesis to explain these data. 

b. Design an experiment that will test your hypoth- 
esis, using the wild-type progeny identified above. 
Describe the results you expect if your hypothesis 
is true. 
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30. In Drosophila, the X-linked echinus eye phenotype 31. While examining a young tortoiseshell cat, you and the 
disrupts formation of facets and is recessive to wild- veterinarian you are interning with get a surprise—the cat 
type eye. Autosomal recessive traits vestigial wing is male, not female! From your undergraduate genetics 
and ebony body assort independently of one another. course, you recall that tortoiseshell coats are produced by 
Examine the progeny from the three crosses shown the random X-inactivation that takes place in mammalian 
below, and identify the genotype of parents in females. The veterinarian orders a chromosome analysis 
each cross. of the cat and finds that he is XXY: He has two 


X chromosomes and one Y chromosome. Help the vet- 
erinarian figure out how a tortoiseshell cat could be male. 

Female Male Female Male (Hint: Think about X-inactivation in mammals with two 
Tint NE SLAVE ED nm X chromosomes.) 


Parental Phenotype Progeny Phenotype Proportion 


a. Wild type Echinus Wild type p 2 
7 i Echinus 3 =a 32. Red-green color blindness in humans is inherited as an 
g 8 X-linked recessive condition. Consider reciprocal crosses 
Vestigial f 5 between a color-blind parent and a parent with normal 
Echinus, vestigial il 1 color vision in which the dominant allele is identified as 
— a 8 C and the recessive allele as c. Cross 1 is Cc X cY, and 
b. Wild type iid types Vestigial ebony a 35 Cross 2 is cc X CY. Determine the phenotypes and their 
Vestigial iat a proportions in progeny produced by each cross. Explain 
= why the reciprocal cross results are consistent with an 
Ebony $ 5 X-linked recessive inheritance but not with an autosomal 
Wild type 18 9 recessive inheritance of color blindness. 
32 32 
Echinus, vestigial, 0 1 
ebony 32 
Echinus, vestigial 0 E 
32 
Echinus, ebony 0 3 
32 
Echinus 0 EL 
32 
c. Ebony Echinus Echinus, vestigial, 1 1 
ebony 32 32 


Echinus, vestigial 
Echinus, ebony 
Echinus 
Vestigial, ebony 
Vestigial 

Ebony 

Wild type 
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Inheritance Patterns of Single 
Genes and Gene Interaction 


CHAPTER OUTLINE 


4.1 Interactions between 
Alleles Produce Dominance 
Relationships 

4.2 Some Genes Produce Variable 
Phenotypes 

4.3 Gene Interaction Modifies 
Mendelian Ratios 

4.4 Complementation Analysis 
Distinguishes Mutations in the 
Same Gene from Mutations in 
Different Genes 


The shape and the color of summer squash are traits that are determined 
by gene interaction. 


M endel’s laws of segregation and independent assort- 

ment encapsulate the basic rules of genetic transmis- 
sion in diploid organisms. We see the results of these rules in 
the relative proportions of progeny with different phenotypes 
from crosses. By assessing the molecular basis for the pheno- 
typic variation, we can also glimpse the connection between 
hereditary transmission of phenotypic traits and DNA, RNA, or 
protein sequence variability. Lastly, on the mechanical level 
explored in Chapter 3, we find the physical basis of these rules 
in the movement and segregation of homologous chromo- 
somes and sister chromatids during meiosis. 

Mendel’s success in identifying and describing these 
two hereditary laws was partly due to his use of traits whose 


4.1 Interactions between Alleles Produce Dominance Relationships 105 


phenotypic characteristics are determined exclusively 
by inheritance of alleles for single genes. In interpret- 
ing the inheritance of these traits, he did not have 

to contend with phenotypic variation introduced by 
other genes or by environmental (nongenetic) factors. 

In Mendel’s experiments, each trait was decided 
by a single pair of alleles, one fully dominant and one 
fully recessive, at each of seven genes. Furthermore, 
environmental factors played a minimal role in the 
phenotypic variation Mendel observed. The simple 
case in which just two alleles influence a trait and 
environment plays no meaningful role is, however, 
quite rare in nature. Although a diploid organism 
can have no more than two alleles at a locus (be- 
cause such individuals have just two copies of each 
chromosome), there may be many alleles for a single 
locus within a population. 

In most cases, phenotype determination is more 
complex than portrayed by Mendel’s examples be- 
cause one or more additional circumstances affect 
the phenotypic outcome. Together, these circum- 
stances are thought of as “extensions of Mendelian 
inheritance,” a phrase that includes two distinct kinds 
of influences on the phenotype ratios produced by 
crosses. The first category that extends Mendel’s 
hereditary concepts are relationships between al- 
leles of a single gene that are other than completely 
dominant and completely recessive. The second 
category of extended Mendelian inheritance is he- 
redity of traits that are influenced by alleles of two 
or more genes. Categorized as gene interactions, this 
phrase refers to any of several ways different genes 
can collaborate or interact with one another or with 
nongenetic (environmental) factors to influence the 
expression of a phenotypic character. In this chapter, 
we examine several examples of allele interactions 
with patterns of dominance that are different from 
those described by Mendel, as well as examples of 
interactions between genes and between genes and 
environmental factors that include the following: 


There may be more than two alleles for a given 
locus within the population. 


Dominance of one allele over another may not be 
complete. 


Two or more genes may affect a single trait. 


The expression of a trait may be dependent on the 
interaction of two or more genes, on the interac- 
tion of genes with nongenetic factors, or both. 


Our examination of these extensions of 
Mendelian inheritance focuses on patterns of phe- 
notypic variation that result from the occurrence of 
allelic, gene-gene, or gene-environment interaction. 
Our discussions demonstrate that while traits arising 
through these interactions do not always exhibit the 
classic Mendelian ratios (described in Chapter 2), the 
observed ratios can nevertheless be explained by 
the operation of Mendelian principles, overlaid by pat- 
terns of interaction between alleles or between genes 
that are different from those encountered by Mendel. 


4.1 Interactions between Alleles 
Produce Dominance Relationships 


Mendel wisely chose to examine traits presenting in one 
of two alternative forms. One form of each trait he stud- 
ied displayed complete dominance over the other form. 
Complete dominance makes the phenotype of a heterozy- 
gous organism indistinguishable from that of an organism 
homozygous for the dominant allele; thus, only organisms 
homozygous for the recessive allele display the recessive 
phenotype. The complete dominance of one allele also re- 
sults in the exclusive expression of the dominant phenotype 
among the heterozygous F; progeny of a cross between 
pure-breeding homozygous parents, while the Fy progeny 
display a 3:1 ratio of dominant to recessive phenotypes. 
We now know that the phenotypes of the seven traits that 
Mendel studied are controlled by two alternative alleles at 
seven different genes. In the cases that have been examined 
at the molecular level, the dominant alleles reflect the wild- 
type function of the gene, while the recessive alleles encode 
gene products with reduced or no functional activity. 
Questions concerning the molecular basis of dominant 
and recessive alleles drove genetic research in the early and 
mid-20th century, including questions of how dominance 
of an allele could be ascertained, why certain mutations 
are recessive whereas others are dominant, and whether 
mutations always cause genes to lose function or whether 
mutations can impart new or additional functions to alleles. 


The Molecular Basis of Dominance 


A character is called dominant if it is seen in organ- 
isms with the homozygous and heterozygous genotypes, 
and it is called recessive if it is observed only in a 
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single homozygous genotype. In this sense, dominance 
and recessiveness have a phenotypic basis. The pheno- 
types are, however, a consequence of the activities of 
proteins produced by the alleles of a gene. In this sense, 
dominance and recessiveness also have a molecular basis. 
The dominance of one allele over another is determined 
by the activity of the protein products of the allele—by the 
manner in which the protein products of alleles work to 
produce the phenotype. 

Let’s compare two examples to illustrate the mo- 
lecular basis of dominance and recessiveness. In both ex- 
amples, a wild-type allele produces an active enzyme and 
a mutant allele produces either very little enzyme or none 
at all. In the first example the mutant allele is recessive, 
but in the second example the mutant allele is dominant. 


Haplosufficient Wild-Type Allele Is Dominant In the 
first example, gene R has a dominant wild-type allele R* 
anda recessive mutant allele r. Gene R produces an enzyme 
that must generate 40 or more units of catalytic activity to 
drive a critical reaction step. Successful completion of this 
step produces the wild-type phenotype, whereas failure 
to complete the step generates a mutant phenotype. Each 
copy of allele R* produces 50 units of enzyme activity. The 
mutant allele r produces no functional enzyme and has 
0 units of activity. Homozygous R‘R* organisms produce 
100 units of enzyme activity (50 units from each copy 
of R*), far exceeding the minimum required to achieve 
the wild-type phenotype. Heterozygous organisms 
(Rr) produce a total of 50 units of enzyme activity, 
which is sufficient to produce the wild-type phenotype. 
Homozygous rr organisms produce no enzymatic action, 
however, and display the mutant phenotype. Based on its 
ability to catalyze the critical reaction step and produce 
the wild-type phenotype in either a homozygous (R*R*) 
or heterozygous (R*r) genotype, R" is dominant over r. 
Dominant wild-type alleles of this kind are identified as 
haplosufficient since one (haplo) copy is sufficient to 
produce the wild-type phenotype in the heterozygous 
genotype. 


Haploinsufficient Wild-Type Allele Is Recessive The 
second example involves gene T, for which the wild-type 
allele is recessive to a mutant allele. Gene T produces 
an enzyme required to catalyze a critical reaction step 
that produces a wild-type phenotype if it is completed. 
The inability to complete the reaction step results in a 
mutant phenotype. For the reaction step in question, 
18 units of enzyme activity are required. The wild-type 
allele T; produces 10 units of activity. A mutant allele, T, 
generates 5 units of enzyme activity. Homozygous T,T, 
organisms generate 20 units of catalytic enzyme activity, 
enough to catalyze the critical reaction step and produce 
the wild-type phenotype. Heterozygous organisms, on the 
other hand, produce only 15 units of enzymatic activity 
and have the mutant phenotype because they fall short 


of the 18 units required to catalyze the reaction step. 
Similarly, homozygous TT organisms, which produce 
10 units of enzyme activity, also have a mutant phenotype. 
In this case, the mutant allele T) is dominant over the 
wild-type allele T; since both the heterozygous (TT) and 
homozygous (TT) organisms have a mutant phenotype. 
In cases like this, the wild-type allele is identified as 
haploinsufficient because a single copy is not sufficient 
to produce the wild-type phenotype in the heterozygous 
genotype. 


Functional Effects of Mutation 


Genetic analysis often focuses on rare mutations and 
other infrequent phenomena. In many instances, the 
study of these rare events provides clues to the underly- 
ing causes of commonly occurring events that are not yet 
understood. In the case of any genetic mutation, a central 
question concerns the precise mechanism through which 
the mutation disrupts normal gene function. 

From a functional perspective, organisms with two 
copies of the wild-type allele have the wild-type pheno- 
type (Figure 4.1a). The same would be true if an organism 
had a single copy of a fully dominant wild-type allele. 
Using the level of activity of the protein products of the 
wild-type allele as the basis for comparison, mutant al- 
leles can often be placed into either a loss-of-function or a 
gain-of-function category. A loss-of-function mutation 
results in a significant decrease or in the complete loss 
of the functional activity of a gene product. This com- 
mon mutational category contains mutations like those 
described in the R-gene and T-gene examples. Loss-of- 
function mutant alleles are usually recessive, but under 
certain circumstances, they may be dominant, depend- 
ing on whether the wild-type allele is haplosufficient or 
haploinsufficient. 

Gain-of-function mutations identify alleles that 
have acquired a new function or have their expression al- 
tered in a way that gives them substantially more activity 
than the wild-type allele. Gain-of-function mutations are 
almost always dominant and usually produce dominant 
mutant phenotypes in heterozygous organisms. As a con- 
sequence of their newly acquired functions, certain gain- 
of-function mutations are lethal in a homozygous state. 


Loss-of-Function Mutations As the previous discussion 
suggests, mutations resulting in a loss of function vary 
in the extent of loss of normal activity of the gene 
product. A loss-of-function mutation that results in a 
complete loss of gene function in comparison to the 
wild-type gene product is identified as a null mutation, 
also known as an amorphic mutation (Figure 4.1b). 
The word null means “zero” or “nothing,” and the word 
amorphic means “without form.” These mutant alleles 
produce no functional gene product and are often lethal 
in a homozygous genotype. The elimination of functional 
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(a) Wild type 


Homozygous 


Alleles 


The expression of the products of wild-type alleles 
produces wild-type phenotype. See Figure 4.5 for 
an example. 


Products 


(b) Loss of function: Null/amorphic mutation 
Homozygous Heterozygous 


Null alleles produce no functional product. 


aaa" i Homozygous null organisms have mutant 


Alleles 
(amorphic) phenotype due to absence of the 


ene product. See Figure 4.5 for an example. 
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(c) Loss of function: Leaky/hypomorphic mutation 
Homozygous Heterozygous 


Alleles Leaky mutant alleles produce a small amount of 
wild-type gene product. Homozygous organisms 
have a mutant (hypomorphic) phenotype. See 


Figure 4.5 for an example. 
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(d) Loss of function: Dominant negative mutation 


Homozygous Heterozygous 
Alleles The formation of mulitmeric proteins is altered by 
dominant negative mutants whose products 
interact abnormally with the protein products of 
Products other genes, leading to malformed multimeric 


Gene product example (osteogenesis imperfecta). 
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(e) Gain of function: Hypermorphic mutation 
Homozygous Heterozygous 


Excessive expression of the gene product leads to 
excessive gene action. The mutant phenotype 
may be more severe or lethal in the homozygous 
genotype than in the heterozygous genotype. 
See Figure 4.10 for an example. 


Alleles 


Products 


(f) Gain of function: Neomorphic mutation 
Homozygous Heterozygous 


The mutant allele has novel function that 
produces a mutant phenotype in homozygous 
and heterozygous organisms, and may be more 
severe in homozygous organisms. See Figure 
16.20 for an example. 


Alleles 


Products 


Figure 4.1 The functional consequences of mutation. (a) Wild type. (b), (c), and (d) Loss-of-function 
mutations. (e) and (f) Gain-of-function mutations. 
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gene products can result from various types of mutational 
events, including those that block transcription, produce 
a gene product that lacks activity, or result in deletion of 
all or part of the gene. 

Alternatively, a mutation resulting in partial loss of 
gene function may be identified as a leaky mutation, 
also known as a hypomorphic mutation (Figure 4.1c). 
Hypomorphic means “reduced form”; like the term leaky, 
it implies that a small percentage of normal functional 
capability is retained by the mutant allele but at a lower 
level than is found for the wild-type allele. The severity of 
the phenotypic abnormality depends on the residual level 
of activity from the leaky mutant allele. A greater percent- 
age of activity from a leaky allele results in a less severely 
affected phenotype than when the mutation incurs a more 
substantial loss of function. Both null and hypomorphic 
loss-of-function mutations are often recessive and homo- 
zygous lethal. Dominant loss-of-function mutations are 
also known to occur. 

Certain loss-of-function mutations produce dominant 
mutant phenotypes through alterations in the function of 
a multimeric protein of which the mutant polypeptide 
forms a part (Figure 4.1d). Multimeric proteins, composed 
of two or more polypeptides that join together to form a 
functional protein, are particularly subject to dominant 
negative mutations as a consequence of some change 
that prevents the polypeptides from interacting normally 
to produce a functional protein. A multimeric protein that 
contains an abnormal polypeptide may suffer a reduction 
or total loss of functional capacity. Mutations of this kind 
are dominant due to the substantial loss of function of the 
multimeric protein. These mutations are characterized as 
“negative” due to the spoiler effect of the abnormal poly- 
peptide on the multimeric protein. 

An example of dominant negative mutation is seen 
in the human hereditary disorder osteogenesis imper- 
fecta (OMIM 116200, 116210, and 116220), which is 
caused by defects in the bone protein collagen and has 
multiple forms with different severity. Collagen protein is 
composed of three interwoven polypeptide strands—two 
polypeptides from the COL1A/ gene and one polypeptide 
from the COL1A2 gene. The trimeric collagen protein is 
subject to dominant negative mutation as a consequence 
of COL1IA1 mutations that produce a defective polypep- 
tide. The trimeric structure of collagen and the 2:1 ratio 
of incorporation of COL1A1 polypeptide over COL1A2 
polypeptide means that in individuals who are homozy- 
gous wild type for COLIA2 and heterozygous for COLIA1 
mutation, most collagen protein contains one or two mu- 
tant COL1A1 proteins. As a result, most collagen protein 
is defective, and osteogenesis imperfecta develops. 


Gain-of-Function Mutations Mutations resulting in a 
gain of function fall into two categories that depend on the 
functional behavior of the new mutation. Hypermorphic 
(“greater than wild-type form”) mutations produce more 


gene activity per allele than the wild type (Figure 4.1e) and 
are usually dominant. The gene product of a hypermorphic 
allele is indistinguishable from that of the wild-type allele, 
but it is present in a greater amount and thus induces 
a higher level of activity. The excess concentration is 
the functional equivalent of overdrive, pushing processes 
forward more rapidly, at the wrong time, in the wrong 
place, or for a longer time than normal. Hypermorphic 
mutants often result from regulatory mutations that 
increase gene transcription, block the normal response to 
regulatory signals that silence transcription, or increase 
the number of gene copies by gene duplication. The 
severity of phenotypic effect may coincide with the 
genotype such that mutation homozygotes display a 
more severely affected phenotype than is observed in 
heterozygotes. 

Gain-of-function mutations resulting from neomor- 
phic (“new form”) mutations acquire novel gene activities 
not found in the wild type (Figure 4.1f) and are usually 
dominant. The gene products of neomorphic mutants are 
functional, but have structures that differ from the wild- 
type gene product. The altered structures lead the mutant 
protein to function differently than the wild-type protein. 
Homozygotes for a neomorphic allele may exhibit a more 
severely affected phenotype than do heterozygotes. 

Our description of the molecular basis of dominance 
and of loss-of-function and gain-of-function mutations 
provides a conceptual basis for understanding how dif 
ferent patterns of dominance relationships can develop 
among alleles of a gene. These concepts apply to all dip- 
loid organisms, but the various notational systems used 
to identify genes and alleles in different species do not all 
depict these relationships in the same ways. These dif- 
ferent notational systems developed in the early years of 
genetics research when genetic experiments were carried 
out by experts in widely divergent fields of biology with 
little intercommunication. Geneticists studying fruit flies 
developed one notation system for identifying wild-type 
and mutant alleles, geneticists studying yeast developed 
another, and geneticists studying plants developed an- 
other. As the table inside the front cover illustrates, each 
model organism has its own unique style of gene descrip- 
tion and nomenclature. The different notation systems 
cause confusion for students of genetics because they fol- 
low different rules for naming and identifying genes and 
alleles. The table inside the back cover contains the rule 
systems we follow throughout this book. 


Incomplete Dominance 


Mendel’s description of inheritance of traits controlled 
by a dominant and a recessive allele of single genes is a 
simple hereditary process that is relatively rare in nature. 
More commonly with single-gene traits, the dominance 
of one allele over another is not complete. Incomplete 
dominance, also known as partial dominance, identifies 
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such circumstances. When incomplete dominance exists 
among alleles, the phenotype of the heterozygous organ- 
ism is distinctive; it falls between the phenotypes of the 
homozygotes on a continuum of some kind and is typi- 
cally more similar to one homozygous phenotype than 
the other. When traits display incomplete dominance, 
two pure-breeding parents with different phenotypes pro- 
duce F; heterozygotes having a phenotype different from 
that of either parent. The F; phenotype is intermediate 
between the parental forms, although it may more closely 
resemble one parental phenotype than the other. 

In previous discussions we used a notational system 
in which an uppercase letter—for example, A—indicates 
a dominant allele, and the same letter in lowercase—a— 
designates a recessive allele. In incomplete dominance 
systems, the relationship between alleles is different, so 
a different notational system—one that avoids implying 
dominance or recessiveness—is used. In the nomenclature 
system for incomplete dominance, alleles are symbolized 
with either upper- or lowercase letters plus a suffix that 
may be a number or a letter. Examples of how pairs of 
alleles with incomplete dominance can be designated are 
Aland A2, B! and B?, d; and do, and w“ and w”. 

Genetic research has identified innumerable exam- 
ples of incomplete dominance in animals and plants; 
one example is the trait described as flowering time in 
Mendel’s pea plants (Pisum sativum). In peas, the first 
appearance of flowers is under the genetic control of a 
locus that we will call T (for flowering time). The earliest- 
flowering strain of pea plants has the homozygous geno- 
type TT; the flowering time of this strain is described 
as day 0.0. The latest-flowering strain is homozygous 
TT», and it flowers 5.2 days later on average than TT; 
plants. A cross of pure-breeding early-flowering and late- 
flowering strains produces TT heterozygous progeny 
that begin to flower 3.7 days later on average than the 
earliest-flowering strain (Figure 4.2a). 

Genetic crosses show that flowering time is con- 
trolled by a single locus. Self-fertilization of T;T> plants 
produces a 1:2:1 ratio of early-, intermediate-, and late- 
flowering progeny (Figure 4.2b). We say the T allele 
is partially dominant, but not completely dominant, to 
T; because the heterozygous phenotype is distinct from 
either homozygous phenotype but more closely resembles 
the late-flowering strain. 


(a) (b) 
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Codominance 


Codominance, like incomplete dominance, leads to a 
heterozygous phenotype different from the phenotype 
of either homozygous parent. Unlike incomplete domi- 
nance, however, codominance is characterized by the 
detectable expression of both alleles in heterozygotes. 
Codominance is most clearly identified when the protein 
products of both alleles are detectable in heterozygous 
organisms, typically by means of some sort of molecular 
analysis such as gel electrophoresis or a biochemical assay 
that can distinguish between the different proteins. We 
explore the details of these types of molecular analysis in a 
later discussion (see Chapter 10). 


Dominance Relationships of ABO Alleles 


More than one pattern of dominance between the alleles of 
a gene can occur under certain circumstances. Here we ex- 
amine the codominance of two alleles and the recessiveness 
of a third allele of the gene determining human blood type. 

One physiological attribute many of us know about 
ourselves is our blood type, which is type A, type B, type AB, 
or type O. All of us have one of these four common blood 
types that result from alleles at the ABO blood group gene 
located on chromosome 9 (OMIM 110300). There are three 
alleles in all human populations, and combinations of the al- 
leles can occur. Most combinations of different ABO alleles 
result in complete dominance of one allele, but one combi- 
nation results in codominance. 

The three alleles of the ABO gene are identified as 
I, P, and i, and the four blood groups are phenotypes 
produced by different combinations of these alleles. On 
the basis of genotype—phenotype (i.e., blood type) correla- 
tion, geneticists have concluded that 7^ and 7” have com- 
plete dominance over i, and that 7^ and Z are codominant 
to one another. The complete dominance of J4 and /° 
to i is indicated by the identification of blood type A in 
individuals whose genotype is I^ or Mi, and of blood 
type B in individuals whose genotype is IŽP or Pi. The 
completely recessive nature of the i allele is confirmed by 
the observation that only ii homozygotes have blood type 
O. Lastly, codominance of / and /° to one another is con- 
firmed by the observation that blood type AB occurs only 
in individuals who have the heterozygous genotype MI’. 


Tax Tats Figure 4.2 Incomplete 

T, T dominance in flowering time 
of pea plants. (a) Allele Tis 
incompletely dominant over 
allele T; as indicated by the 
late flowering time of T;T> 
plants. (b) Segregation of 
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Testing for ABO Blood Type Blood typing for ABO 
blood type makes use of an antigen-antibody reaction to 
determine if a specific antigen—identified by a sugar moiety 
embedded on the surface of red blood cells—is present in a 
given person’s blood. An antibody is a molecule, produced 
by the immune system, that binds to a specific antigen. 
A positive reaction occurs when the antibody detects its 
antigen target. The antibody binds the antigen and also 
attaches to other antigen-bound antibodies, causing red 
blood cells to form visible clumps. Clumping indicates that 
the antibody has detected its antigen target, whereas an 
absence of clumping indicates that blood does not contain 
the antigen target of the antibody. 

To test for ABO blood type, two antisera—one called 
“anti-A antiserum” and containing purified anti-A anti- 
body, the other called “anti-B antiserum” and containing 
purified anti-B antibody—are placed in separate depres- 
sions on a microscope slide, and a drop of the blood to be 
typed is added to each depression. A person with blood 
type A shows clumping with anti-A antiserum but not with 
anti-B (Figure 4.3). Conversely, blood type B is identified 
when clumping occurs with anti-B but not with anti-A. If 
clumping occurs with both antisera, the blood type is AB. 
Clumping with neither antiserum identifies blood type O. 

Proper cross-matching of blood type is essential for 
safe blood transfusion. In reality, several antigens pro- 
duced by different genes determine the suitability of do- 
nor and recipient blood for transfusion, and hospitals and 
clinics must carefully compare donor and recipient blood 
to identify the possibility of adverse reactions before 
transfusion takes place. The general rule for safe blood 


Blood type Clumping with Possible genotypes 
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Figure 4.3 ABO blood type. Blood type is determined by 
mixing a drop of blood with a drop of anti-A or anti-B antiserum. 


transfusion is that the recipient blood must not contain 
an antibody that reacts with an antigen in the donated 
blood. When such a reaction occurs, hemolysis can occur 
and blood clots produced by clumping blood cells form at 
the site of transfusion. These adverse reactions can poten- 
tially cause life-threatening complications. 

The antibodies anti-A and anti-B develop in humans 
from birth, but people do not carry an antibody if they also 
carry the corresponding antigen. Thus people with blood 
type A, who have the A antigen, also carry the anti-B 
antibody. People with blood type B have the B antigen and 
the anti-A antibody. Those with blood type AB have both 
antigens and neither anti-A nor anti-B antibody. Finally, 
people with blood type O have neither A nor B antigen 
and have both anti-A and anti-B antibody. 


The Molecular Basis of Dominance and Codominance 
of ABO Alleles The two ABO blood group antigens 
on the surfaces of red blood cells each have a slightly 
different molecular structure. The antigens are glycolipids 
that contain a lipid component and an oligosaccharide 
component. The lipid portion of the antigen is anchored in 
the red blood cell membrane, and the segment protruding 
outside the cell contains the oligosaccharide. Initially, the 
oligosaccharide is composed of five sugar molecules and is 
called the H antigen. It results from the activity of an enzyme 
produced by the H gene (Figure 4.4). The H antigen is 
present on the surfaces of all red blood cells; it can be further 
modified, in two alternative ways, by the addition of a sixth 
sugar, or it can be left unmodified. The final modification 
of the H antigen depends on the enzymatic activity of the 
protein product of the ABO blood group locus. 

Two alternative sugars can be added to the H antigen 
by the gene products of the 7^ or Z alleles, respectively. 
If the 7^ allele is present in the genotype, it produces the 
gene product a-3-N-acetyl-p-galactosaminyltransferase, 
or simply, “A-transferase.” A-transferase catalyzes the 
addition of the sugar N-acetylgalactosamine to the H an- 
tigen, producing a six-sugar oligosaccharide known as the 
A antigen. The JPallele, on the other hand, produces a-3- 
p-galactosyltransferase, commonly called “B-transferase,” 
which catalyzes the addition of a different sugar, galac- 
tose, and produces a six-sugar oligosaccharide known as 
the B antigen. The molecular basis of the differences be- 
tween the A and B alleles is several nucleotide differences 
that change four amino acids of the resulting transferase 
enzymes and alter enzymatic activity. In contrast, the i al- 
lele is due to a single base-pair deletion and is a null allele 
that does not produce a functional gene product capable 
of adding a sixth sugar to the H antigen. 

At the cellular level, anti-A antibody recognizes the 
N-acetylgalactosamine addition mediated by 74, and anti-B 
antibody identifies the galactose addition produced by the ac- 
tion of PP. Neither of these antibodies has any reactivity with 
the unmodified H antigen, so unmodified H antigen, present 
in individuals with blood type O, is not recognized by either 
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Figure 4.4 Production of ABO blood group antigens. 


antibody. Either one or two copies of the / or the / allele in 
a genotype is sufficient to produce an ABO antigen detect- 
able by anti-A or anti-B antibodies. Both /4 and F are domi- 
nant to i, since 7^ and F produce enzymes that modify the H 
antigen but i does not. On the other hand, the 747” genotype 
leads to production of both A-transferase and B-transferase, 
resulting in the addition of N-acetylgalactosamine to some 
H antigens and the addition of galactose to other H antigens. 
In the AP genotype, all red blood cells carry both types of 
H-antigen modifications; about half of the red cell surface 
antigens are A antigens, and the rest are B antigens. In the 
heterozygous [7° genotype, therefore, the action of both al- 
leles is detected in the phenotype, leading to the conclusion 
that 7^ and J? are codominant to one another. 

Many nonhuman primates have a blood group sys- 
tem that is essentially identical to the human ABO blood 
group system. ABO blood groups have been identified 
in the great apes (chimpanzee, gorilla, and orangutan) 
as well as in numerous Old World monkey species, in- 
cluding macaques (genus Macaca) and baboons (genus 
Papio). Two important evolutionary observations derive 
from this finding. First, the ABO blood group is a long- 
standing feature of the immune system genetics in pri- 
mates, one that evolved early in the ancestral history of 
primates and was retained over tens of millions of years 
as primates diversified. Second, the retention of the ABO 


blood group system in primates demonstrates the im- 
portance of this immune system response in protecting 
primates from infectious and foreign antigens. Natural 
selection has played a preeminent role in maintaining 
this system. The ABO blood group genes are one example 
of the shared evolutionary history that can be identified 
through the examination of the taxonomic distribution 
of genes in lineages. Genetic Analysis 4.1 examines the in- 
heritance of blood group phenotypes, where alleles have a 
variety of dominance relationships. 


Allelic Series 


Diploid genomes contain pairs of homologous chromo- 
somes; thus, each individual organism can possess at most 
two alleles at a locus. In populations, however, the number 
of alleles is theoretically unlimited, and some genes have 
scores of alleles. At the population level, a locus possessing 
three or more alleles is said to have multiple alleles. The 
ABO blood group locus, with its three alleles, is one example 
of multiple alleles. Like the ABO gene, other multiple-allelic 
loci display a variety of dominance relationships among the 
alleles. Commonly, an order of dominance emerges among 
the alleles, based on the activity of each allele’s protein prod- 
uct, forming a sequential series known as an allelic series. 
Alleles in an allelic series can be completely dominant or 


GENETIC ANALYSIS 


PROBLEM The MN blood group in humans is an autosomal codominant system with two alleles, M 
and N. Its three blood group phenotypes, M, MN, and N, correspond to the genotypes MM, MN, and NN. 
The ABO blood group assorts independently of the MN blood group. 


A male with blood type O and blood type MN has a female partner with blood type AB and blood type N. {BREAK IT DOWN: Alleles of the ABO system have 
Identify the blood types that might be found in their children, and state the proportion for each type. 


given here. 


BREAK IT DOWN: The discussion on page 110 
about the relationships among ABO alleles will help you 
to identify the parental genotypes from the phenotypes 


(p. 113). 


both dominant-recessive and codominant relationships 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic of this problem and 
the kind of information the 
answer should contain. 


1. The problem concerns the inheritance of two blood types. The gene determin- 
ing ABO blood type carries three alleles: /^ and I are codominant to one an- 
other and dominant to i. The MN blood group gene carries two alleles that are 


codominant. The answer requires finding the possible blood types, and their 
expected proportions, of the children of parents whose blood types are given. 


2. Identify the critical information given 


2. The blood types of the parents are given. 


in the problem. TIP: Blood type 0 is the recessive 
phenotype, and blood type MN is due 
to codominance of alleles. J 


Deduce 


3. Deduce the blood group genotypes 
of the male parent. 


3. The male has blood types O and MN. Type O results from homozygosity for 
the recessive i allele, whereas MN is produced in heterozygotes carrying both 


alleles. The male genotype is ii MN. 


4. Deduce the blood group genotypes 4. 
of the female parent. 


The female has blood groups AB and N. The AB blood type is found in 
heterozygotes, and blood type N in homozygotes. The female blood group 
genotype is 4/3 NN. 


TIP: Blood type AB is due to 
codominance, and blood type N is 
Solve due to homozygosity. 
5. Identify the gamete genotypes and 5. 
their frequencies for the male. 
6. Identify the female gamete genotypes 6. 
and their frequencies. 


Independent assortment predicts two gamete genotypes for the male: All gam- 
etes contain i, half carry M, and half carry N. 

Independent assortment predicts two gamete genotypes for the female: All 
gametes contain N, half contain ^, and half contain /°. 


Ni 


NNI‘i 
Blood types: 
NandA 


7. Predict the progeny genotypes and 7. Ng Mi 
„Phenotypes. ? MNI‘i 
TIP: Use a Punnett square to evalu- NI’| Blood types: 
ate this cross. MN and A 
MNI‘i 


NI’| Blood types: 


MN and B 


NNI’i 
Blood types: 
N and B 


For more practice, see Problems 6, 9, and 31. 


Visit the Study Area to access study tools. 


completely recessive, or they can display various forms of 
incomplete dominance or codominance. 


The C-Gene System for Mammalian Coat Color Genetic 
analysis of coat color in mammals reveals that many genes 
are required to produce and distribute pigment to the 
hair follicles or skin cells, where they are displayed as coat 
color or skin color. While various interactions among 
these genes can modify color expression, we focus here 
on just one gene, the C (color) gene that is responsible 
for coat color in mammals such as cats, rabbits, and mice. 
This gene has dozens of alleles that have been identified 
in more than 80 years of genetic analysis, but we limit our 
discussion to just four alleles that form an allelic series. The 
C gene produces the enzyme tyrosinase, which is active in 
the first two steps of a multistep biochemical pathway that 
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synthesizes the pigment melanin, which imparts coat color 
in furred mammals and skin color in humans. In the initial 
melanin pathway steps, tyrosinase is responsible for the 
breakdown (catabolism) of the amino acid tyrosine. 

The C-gene alleles form an allelic series that is revealed 
by the phenotypes of offspring of various matings. Allele C 
is dominant to all other alleles of the gene, and any geno- 
type with at least one copy of C produces wild-type coat 
color. These genotypes are written as C- to indicate that 
regardless of the second allele in the genotype, the pheno- 
type is dominant. Three other alleles, producing tyrosinase 
enzymes with reduced or no tyrosinase activity, form an 
allelic series with C (Figure 4.5). The allele c” produces a 
phenotype called chinchilla, a diluted coat color. This allele 
is hypomorphic and generates reduced coat color as a result 
of the reduced level of activity of the gene product. The c” 


Allele (a 
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v 
€ 


Ž Oat 


Full color Chinchilla Himalayan 
CG C- elele deele 
Type of Wild-type Hypomorphic Hypomorphic 
mutation (leaky) (temperature- 
sensitive) 


Albino 
cc 


Null (amorphic) 


Figure 4.5 Allelic series for 
coat-color determination in 
mammals. 
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allele produces the Himalayan phenotype, characterized by 
fully pigmented extremities (paws, tail, nose, and ears) but 
virtually absent pigmentation on other parts of the body. 
The Himalayan phenotype is the “Siamese” coat-color pat- 
tern often seen in cats, rabbits, and mice. This allele is 
temperature sensitive, as we describe momentarily. Finally, 
the c allele produces a protein product with no enzymatic 
activity. This is a fully recessive null (amorphic) allele that 
does not produce a functional gene product. Homozygosity 
for this allele produces an albino phenotype. 

Crosses between animals with different genotypes 
at the C gene indicate the dominance relations of the al- 
leles. For example, in Crosses A, B, and C in Figure 4.6, 
complete dominance of C over other alleles in the series 
is demonstrated by the finding that all of the progeny of 
an animal with the genotype CC have full color, regard- 
less of the genotype of the mate. The dominance order of 
alleles in the series is revealed by the pattern of 3:1 ratios 
obtained from crosses of various heterozygous genotypes 
shown in Figure 4.6. In Cross D, chinchilla is shown to 
be partially dominant over Himalayan. Most of the coat 
of these animals has diluted (chinchilla) color, and the 
Himalayan pattern has darker color of paws, face, and 
tail. Cross E shows that chinchilla is completely dominant 
over albino. Himalayan, too, is completely dominant over 
albino (Cross F). The dominance relationships within this 
allelic series locus can be expressed as C > Ch Sc" Se 


The Molecular Basis of the C-Gene Allelic Series 
Tyrosinase enzymes produced by different C-gene alleles 
have distinctive levels of catabolic activity that are the basis 
for the dominance relationships between the alleles. The 
allele C is a dominant wild-type allele producing fully active 
tyrosinase that is defined as 100% activity. The percentage 
of wild-type tyrosinase activity produced by each allele 
explains the order observed for the allelic series. Biochemical 
examination reveals that the enzyme produced by the "il 
hypomorphic allele has less than 20% of the activity of the 


wild-type enzyme. In the homozygous cc genotype or 


heterozygous genotypes c““c"' or cc, only a small amount of 
melanin is synthesized. This leads to a decreased amount of 
pigment, and it has the effect of muting the coat color. 

The tyrosinase enzyme produced by the hypomorphic 
o} (Himalayan) allele is unstable and is inactivated at a 
temperature very near the normal body temperature of 
most mammals. This type of gene product is an example 
of a temperature-sensitive allele. Cats with the Siamese 
coat-color pattern are familiar examples of the action of 
this temperature-sensitive allele. The parts of cats that are 
farthest away from the core of the body (the paws, ears, 
tail, and tip of the nose) at most times tend to be slightly 
cooler than the trunk. At these cooler extremities, the 
temperature-sensitive tyrosinase produced by the c al- 
lele remains active, producing pigment in the hairs there. 
However, in the warmer central portion of the body, the 
slightly higher temperature is enough to cause the tyrosi- 
nase produced by the c” allele to denature, or unravel. This 
inactivates the enzyme and leads to an absence of pigment 
in the central portion of the body. Animals that are cch or 
cc have the Himalayan phenotype. The final allele in the 
series, c, is a null allele that does not produce functional 
tyrosinase. Homozygotes for this allele are unable to initi- 
ate the catabolism of tyrosine. This leads to an absence of 
melanin and produces the condition known as albinism. 


Lethal Alleles 


Certain single-gene mutations are so detrimental that 
they cause death early in life or terminate gestational 
development. These life-ending mutations affect genes 
whose products are essential to life. Homozygosity for 
mutation of these essential genes is lethal, and the muta- 
tions are identified as lethal alleles. As a rule, recessive 
lethal alleles have low frequencies in populations, al- 
though they may persist in some populations over a long 
period of time. Natural selection can eliminate copies 
of the allele when they occur in homozygous genotypes; 
however, recessive lethal alleles are “hidden” by dominant 
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(a) Cross A (b) Cross B (c) Cross C 
= a 
P x P x P x 
á Oat 
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= a 3 a 
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Chinchilla Chinchilla Chinchilla Chinchilla Himalayan Himalayan 
F, g c" F, E: c 
$ “i = s 
} li li 
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|) 4 \4 
had é x € 
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Chinchilla/Himalayan Chinchilla} Albino Himalayan] Albino 


Figure 4.6 The genetics of C-gene dominance. (a)-(f) Crosses A to F illustrate the complete 
dominance of c and the complete recessiveness of c, and establish the allelic series as C > Ose Se. 


wild-type alleles in heterozygous genotypes, thus evading 
natural selection. Under certain circumstances, hetero- 
zygous carriers of a recessive lethal allele have a natural 
selection advantage (see Chapter 10). 

Lethal alleles are often detected as distortions in 
segregation ratios, where one or more classes of expected 


progeny are missing. For example, in plant and animal 
crosses between two organisms heterozygous for a re- 
cessive lethal allele, the phenotype of the progeny is 3:1 
(viable:dead). The dead offspring are homozygous for a 
recessive lethal mutation. These progeny might not be 
seen at all, due to embryonic lethality, or they may be 
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Embryo lethal (RPN1a/ 
rpn1a x RPN1a/rpnta) 3:1 


Gametophyte lethal 


Wild type (FER/fer x male) 1:1 


Figure 4.7 Evidence of lethal mutations in plants. 
Gametophytic lethality is detected by observing a 1:1 ratio of 
living to dead seeds. Arrows indicate undeveloped seeds. 


stillborn or die very young. Of the viable offspring, two- 
thirds are expected to be heterozygous for the lethal allele 
and one-third are expected to be homozygous for the 
dominant wild-type allele (Figure 4.7). 


Detection in Plants In flowering plants, the effects 
of lethal alleles can be observed directly. For example, 
mutation of the RPNJa gene that encodes a subunit of 
the 26S proteosome, a multi-protein complex involved in 
protein degradation, is an example of a loss-of-function 
null mutation (rpnia) that results in embryonic lethality 
in Arabidopsis thaliana and other plant species. In an 
RPN1a/rpnla X RPN1a/rpnia cross, a 3:1 segregation 
ratio of living seeds (RPNI1a) to dead seeds (rpnla/rpn1a) 
can be observed in the fruit. When the living seeds 
are planted, approximately two-thirds are heterozygous 
for the lethal allele (RPNIa/rpnia) and one-third are 
homozygous for the wild-type allele (RPN1a/RPN1a). 
Lethal mutations that result in female gametophytic 
lethality are also detectable in flowering plants. Consider 
a plant heterozygous for a female gametophytic allele, 
FER/fer, in which the wild-type FER allele was derived from 
its mother, and the mutant fer allele came from its father. 
During megasporogenesis, half of all megaspores will inherit 
the FER allele and the other half will inherit the fer allele. 
Embryo sacs derived from megaspores inheriting the fer al- 
lele will die, so that only half of all ovules develop into seeds. 
The alleles segregate in a 1:1 ratio that is observed among 
the developing seeds in a fruit. Note that the 1:1 ratio is a 


direct observation of Mendelian ratios in the gametes of 
a heterozygous organism. Thus a 1:1 ratio distinguishes 
female gametophytic lethality from embryonic lethality, 
which results in a 3:1 ratio among seeds. Plants usually 
produce pollen in excess, similar to the excess of sperm 
production relative to egg production in animals; thus, male 
gametophytic lethality is not observable by looking at devel- 
oping seeds in the fruit. It can be detected, however, by look- 
ing for plants in which half of all the pollen grains are dead. 


Detection in Animals In contrast, lethal alleles in animals 
are usually detected by a distortion in segregation ratios. 
The first case of a lethal allele was identified in 1905 by 
Lucien Cuenot, who studied a lethal mutation in mice 
carrying a dominant mutation for yellow coat color. In 
mice, wild-type coat color is a brown color, called “agouti” 
(a-GOO-tee), produced by the presence of yellow and 
black pigments in each hair shaft (Figure 4.8a). Agouti 
hairs are black at the base and tip, with yellow pigment 
in the central portion of the shaft. Yellow coat color is 
seen when yellow pigment is deposited along the entire 
length of the hair shaft, not just in the middle portion as 
it is in agouti (Figure 4.8b). The Agouti gene is one of the 
pigment-producing genes found in mammals with furry 
coats. It produces a yellow pigment called pheomelanin that 
is found in the hairs of mammalian coats. An independently 
assorting gene produces the black pigment that is part of 


(a) Agouti coat color 


(b) Yellow coat color 


Figure 4.8 Coat color in mice. (a) Wild-type agouti coat 
color is a mixture of black and yellow pigment in hair shafts. 
(b) Yellow coat occurs when yellow pigment produced by the 
overly active mutant allele A” displaces black pigment. 
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this example. The wild-type allele for agouti coat color is 
designated A, and its normal activity leads to the production 
of a moderate amount of yellow pigment. The mutant allele, 
designated A’, is a hypermorphic allele. It is a dominant 
gain-of-function mutation that produces substantially more 
yellow pigment than does the wild-type allele. 

The A’ mutation is dominant, but true-breeding yel- 
low mice cannot be produced. From a genetic perspective, 
this means that mice with yellow coat color are heterozy- 
gous (AA ” and that the AYA” genotype is lethal in embryonic 
development due to its interference with an essential gene, 
as we explain momentarily. From this information, two 
important observations about the genetics of the yellow 
allele can be made. First, mating an agouti mouse and a 
yellow mouse will always result in a 1:1 ratio of agouti and 
yellow among progeny (Figure 4.9a). Second, crosses be- 
tween two yellow mice (both of which are necessarily het- 
erozygous) produce evidence of the recessive lethal nature 
of the A” allele (Figure 4.9b). The outcome of these crosses is 
a 2:1 ratio of yellow to agouti, rather than the 3:1 ratio that 
is anticipated when heterozygotes expressing a dominant 
allele are crossed. The genetic interpretation of this obser- 
vation is that alleles of heterozygous yellow mice segregate 
normally in gamete formation and unite at random to 
produce a 1:2:1 ratio at conception, but that AYAY zygotes 
do not survive gestation. Recessive lethality of AY prevents 
embryonic development of homozygotes, eliminating that 
class among progeny and resulting in the 2:1 ratio seen 
among progeny of heterozygous parents. 

Nearly a century after Cuenot first identified homo- 
zygous lethality of the mutant A” allele, the molecular 
basis of the lethality was identified. Much to the surprise 
of geneticists, the lethality had little to do with yellow coat 
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color itself; instead, yellow coat was an almost inadvertent 
consequence of a mutation that deleted part of a gene near 
the coat-color gene. 

The mutation producing the A” allele results from 
a deletion that affects two genes, the Agouti gene and a 
neighboring gene identified as Raly. Raly produces a pro- 
tein that is essential for mouse embryo development. Each 
gene has its own promoter. The wild-type Raly promoter 
drives a high level of transcription, whereas the Agouti 
gene promoter is considerably less actively transcribed 
(Figure 4.10). The dominant mutation producing yellow 
coat color comes about by a deletion of approximately 
120,000 bp that deletes the entire Raly gene and the Agouti 
gene promoter, thus bringing the Agouti gene under the 
control of the Raly promoter, leading to a mutant hyper- 
morphic agouti allele. The Raly promoter drives a high 
level of Agouti gene transcription that results in excess yel- 
low pigment that displaces black pigment in hair shafts and 
leads to the mutant yellow phenotype. In reality, however, 
this deletion mutation affects both the Agouti and Raly 
genes that happen to be side by side on the mouse chro- 
mosome. By this deletion, Agouti transcription is substan- 
tially increased and the Raly gene is deleted. Heterozygotes 
with the AA” genotype have yellow coats and survive due 
to haplosufficiency of the single copy of Raly. Homozygous 
AYAY mice are unable to produce the essential protein 
product from the Raly gene and fail to develop, resulting 
in the skewed 2:1 Mendelian ratio that characterizes the 
progeny of two heterozygous yellow-coated mice. 


An Allele That Is Both Dominant and Recessive The AY 
allele is a rare example of an allele that can be classified as 
both dominant and recessive. This may sound confusing 


Figure 4.9 Dominance and lethality of (a) (b) 
AY. (a) A 1:1 ratio identifies A” as a domi- 
nant mutant allele. (b) The lethality of A” Q x K: ka X É: 
in the homozygous genotype results in a i $ P .* * 
2:1 ratio of yellow to agouti in the cross of AA AA“ AA“ AA” 
yellow-coated heterozygous mice. Agouti Yellow Yellow Yellow 
F, A AY F; A AY 
BAG BG 
A AA AA” A AA AA” 
Agouti Yellow Agouti Yellow 
me ap on. 
BAG re 
A AA AA” AY AA’ AYA’ 
Agouti Yellow Yellow (Lethal) 
f Q 3 AA Agouti f A 3 AA Agouti 
M: } AA” Yellow sa 3AA” Yellow 
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AL eae Chromosomes carrying 
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deleted by mouse embryonic 
mutation development, and a 
moderate amount of 
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promoter 


A’ allele (ET) | Chromosomes carrying 


Maou dene the mutant A” allele 
produce no Raly protein 
and a very high level of 
yellow pigment due to the 
hypermorphic mutation. 


Figure 4.10 Mutation of Raly and Agouti producing 
yellow coat. 


and contradictory, but it is based on the phenotypes 
produced by genotypes of the Agouti gene. We refer to 
the mutant allele as dominant or as recessive depending 
on the particular phenotype we happen to be examining. 

When we look at the ratio of agouti versus yellow 
coat color among the progeny produced by a yellow 
mouse mating with an agouti mouse, we see a 1:1 ratio 
that indicates dominance of the mutant allele over the 
wild-type allele. Dominance in this instance is due to the 
gain-of-function of yellow pigment by the mutant allele. 
If, on the other hand, we look at the ratio of progeny with 
yellow versus agouti coat color in the cross of two yellow 
mice, we see a 2:1 ratio that is the result of the homozy- 
gous lethality of the mutant allele. In this context, lethality 
only affects homozygotes, and the mutant allele is reces- 
sive to the wild-type. This relationship is due to the loss 
of function of the Raly gene caused by its deletion. We 
have, therefore, the odd circumstance of one mutant allele 
that is both dominant and recessive, depending on how its 
phenotypic effect is examined. 


Sex-Limited Traits 


The sex of an organism can exert an influence on its gene 
expression. This effect is often due to the hormonal envi- 
ronment (i.e., in a male or in a female) in which the gene 
is located. As such, the differential expression of a gene is 
sex-dependent. One consequence of such influence is the 
potential limitation of gene expression to one sex but not 
the other in a pattern called sex-limited gene expres- 
sion. Differences in gene expression between the sexes 
can result in the appearance of these sex-limited traits. 
Both sexes typically carry the genes for sex-limited traits, 
but the genes are expressed in just one sex. 

In mammals, for example, the development of breasts 
and the ability to produce milk are traits limited to fe- 
males. Horn development is a trait limited to males in 


some species of sheep, cows, and other hoofed animals. 
Behavioral traits in some species, particularly traits re- 
lated to mating, are also strongly influenced by sex. For 
example, the courtship behavior of crowned cranes in- 
cludes an elaborate display of body positioning, neck in- 
tertwining, and vocalization that is performed differently 
by males and females of the species. 

The mechanism that limits the expression of a trait 
to just one sex is most often the differential influence 
of hormones acting as intercellular regulators of gene 
expression. In the case of male canary vocalization, for 
example, changes in male singing patterns are initiated in 
late winter by an increase in male hormones released by 
the brain in response to increased day length and warmer 
temperatures. These hormones stimulate enlargement 
of the testes and increased production of testosterone, 
which in turn stimulates the development of neurons in 
the brain that elaborate the song center, induce the devel- 
opment of muscles in the vocalization area of the throat, 
and allow males to produce sex-limited vocalization to 
attract mates. 


Sex-Influenced Traits 


Sex-influenced traits are those in which the phenotype 
corresponding to a particular genotype differs depend- 
ing on the sex of the organism carrying the genotype. 
Hormones are thought to influence the differential ex- 
pression of genotypes in the sexes. 

The appearance of a chin beard versus the absence of 
a beard, the beardless phenotype, in certain goat breeds is 
an example of a sex-influenced trait. Bearding is inherited 
as an autosomal trait determined by two alleles, B; and 
Bə, which are present in three genotypes in each sex. In 
both sexes, B;B; homozygotes are beardless, and homo- 
zygotes of either sex with the B2By genotype are bearded. 
It is thought that androgenic hormones are a principal 
factor influencing the bearded phenotype. The effect of 
different levels of androgenic hormones on bearding in 
the sexes is seen by comparing females and males with 
the heterozygous genotype (B,B2). Heterozygous males 
have a beard, whereas heterozygous females are beardless. 
Figure 4.11 illustrates the results of a cross between two 
heterozygotes that produces different ratios of bearded to 
beardless males and females. Mendelian inheritance oc- 
curs, but as a consequence of sex-influenced expression, 
the cross yields a 3:1 ratio of bearded to beardless males 
and a 3:1 ratio of beardless to bearded females. The domi- 
nance relationship of these alleles varies with sex. Allele 
B; is dominant to By in females since females that are 
heterozygous BB have the same beardless phenotype as 
do B;B; females. On the other hand, allele By is dominant 
over B; in males since heterozygotes are bearded just like 
B2B homozygotes. Analogous to the classification of the 
A’ allele we discussed earlier, the B; and By alleles exhibit 
flexibility of dominance, in this case depending on the sex 
of the bearer. 
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Figure 4.11 Sex-influenced inheritance of beard appear- 
ance in goats. Dominance of the B, and B; alleles is expressed 
differently in males and females. 


Delayed Age of Onset 


From an evolutionary perspective, it is easy to understand 
that a dominant lethal allele can be efficiently eliminated 
by the action of natural selection. Even so, there are 
numerous examples of dominant lethal hereditary con- 
ditions, and a pertinent evolutionary genetic question 
concerns how these mutations persist in populations. 
One answer is that some dominant lethal alleles sidestep 
natural selection by having a delayed age of onset; the 
abnormalities they produce do not appear until after af- 
fected organisms have had an opportunity to reproduce 
and transmit the mutation to the next generation. 

One well-characterized example of delayed age of 
onset of a dominant lethal allele in humans is the condi- 
tion called Huntington disease (HD). This progressive 
neuromuscular disorder, usually fatal within 10 to 15 years 
of diagnosis, is caused by mutation of a gene near one end 
of chromosome 4. (We have more to say about the symp- 
toms and progression of HD in Chapter 5, where we also 
discuss the mapping of the HD gene, and in Chapter 16, 
where we discuss the cloning of the HD gene.) The HD 
mutant allele persists in the population because symptoms 
do not begin in about half of all cases until the person’s 
late thirties or early forties, well after most people have 
begun having children (Figure 4.12). 

Functionally, the onset of symptoms of HD is delayed 
because the symptoms are due to neuron death, which 
usually takes place over an extended period of time that 
often stretches over several decades. 


4.2 Some Genes Produce 
Variable Phenotypes 


To interpret phenotype ratios and identify the distribution 
of genotypes among phenotypic classes, geneticists make 
the assumption that phenotypes differ because their un- 
derlying genotypes differ. This assumption is valid only to 
the extent that a particular genotype always produces the 
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Figure 4.12 The age-of-onset curve for Huntington 
disease (HD). 


same phenotype. If the correspondence between genotype 
and phenotype holds true in every case, the trait is identi- 
fied as having complete penetrance If the correspondence 
between genotype and phenotype does not consistently hold 
true—if instead the same genotype can produce different 
phenotypes—the usual reasons are gene—environment in- 
teraction or interactions with alleles of other genes in the 
genome. 

In this section, we describe two phenomena, referred 
to as incomplete penetrance and variable expressivity, 
in which phenotypic variation occurs among organisms 
with the same genotype. In addition, we look at specific 
instances of environmental influence on gene expression 
that is often associated with incomplete penetrance or 
variable expressivity. 


Incomplete Penetrance 


When the phenotype of an organism is consistent with the 
organism’s genotype, the organism is said to be penetrant 
for the trait. In such a case, if the organism carries a domi- 
nant allele for the trait in question, the dominant pheno- 
type is displayed. Sometimes an organism with a particular 
genotype fails to produce the corresponding phenotype, in 
which case the organism is nonpenetrant for the trait. 

Traits for which nonpenetrant individuals occasion- 
ally or routinely occur are identified as displaying in- 
complete penetrance. The human condition known as 
polydactyly (“many digits”) is an autosomal dominant con- 
dition that displays incomplete penetrance. Individuals 
with polydactyly have more than five fingers and toes—the 
most common alternative number is six (Figure 4.13). 
Polydactyly occurs in hundreds of families around the 
world, and in these families the dominant allele is nonpen- 
etrant in about 25-30% of individuals who carry it. Most 
people who carry the dominant mutant polydactyly allele 
have extra digits; but at least one in four people with the 
mutant allele do not have extra digits and instead express 
the normal five digits. The gene mutated to produce poly- 
dactyly was recently identified (see Chapter 20). 


Figure 4.13 Polydactyly, an autosomal dominant trait with 
incomplete penetrance. 


Figure 4.14 shows a family in which polydactyly seg- 
regates as a dominant mutation. Nine individuals in the 
family carry a copy of the polydactyly allele. Six of them 
are penetrant for the phenotype (meaning that they ex- 
press the phenotype), but at least three family members— 
IL-6, II-10, and HI-10—are nonpenetrant. Each of these 
individuals has a child or grandchild with polydactyly; 
thus, each carries the dominant allele for polydactyly 
but is nonpenetrant for the condition. When nonpen- 
etrant individuals are relatively common, the magnitude 
of frequency of penetrance can be quantified. Penetrance 
values vary among different families, but for the fam- 
ily shown in Figure 4.14, the penetrance of polydactyly is $, 
or 66.7%, which is about the average seen worldwide 
among hundreds of families with polydactyly. 


Variable Expressivity 


Sometimes the discrepancy between genotype and phe- 
notype is a matter of the degree or specific manifestation 
of expression of a trait rather than presence or absence 
of the trait altogether. In the phenomenon of variable 
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expressivity, the same genotype produces phenotypes that 
vary in the degree or form of expression of the allele of 
interest. 

Waardenburg syndrome is a human autosomal domi- 
nant disorder displaying variable expressivity. Individuals 
with Waardenburg syndrome may have any or all of four 
principal features of the syndrome: (1) hearing loss, (2) differ- 
ently colored eyes, (3) a white forelock of hair, and (4) prema- 
ture graying of hair. In the Waardenburg pedigree shown in 
Figure 4.15, notice that the circles and squares representing 
family members with Waardenburg syndrome may be en- 
tirely or only partly colored. Each quadrant of the symbols 
represents one of the principal features of the syndrome. 
The diversity of symbol darkening demonstrates the varia- 
tion in expressivity of Waardenburg syndrome in this family. 
Molecular genetic analysis tells us that each family member 
with Waardenburg syndrome carries exactly the same domi- 
nant allele, yet among the eight affected members of the fam- 
ily, there are six different patterns of phenotypic expression. 

It is often difficult to pinpoint the cause of incom- 
plete penetrance or variable expressivity. Three kinds 
of interactions may be responsible: (1) other genes that 
act in ways that modify the expression of the mutant al- 
lele, (2) environmental or developmental (i.e., nongenetic) 
factors that interact with the mutant allele to modify its 
expression, or (3) some combination of other genes and 
environmental factors interacting to modify expression 
of the mutation. In inbred laboratory strains of model ge- 
netic organisms, variation in genetic factors can be elimi- 
nated experimentally to allow separation of gene—gene 
and gene—environment variability, something that cannot 
be done in organisms such as humans. 


Gene-Environment Interactions 


Genes control virtually all of the differences observed be- 
tween species. The genome of an organism lays out the 
body plan and biochemical pathways of the organism, and 
it controls the progress of development from conception to 
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Figure 4.15 Variable expressivity | 
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death. But genes alone are not responsible for all the varia- 
tion seen between organisms. The environment, the myriad 
of physical substances, events, and conditions an organism 
encounters at different stages of life, is the other essen- 
tial contributor to observable variation between organisms. 
Gene-environment interaction is the result of the influ- 
ence of environmental factors (i.e., nongenetic factors) on 
the expression of genes and on the phenotypes of organisms. 

As an example, consider the tall and short pure- 
breeding lines of pea plants studied by Mendel. Inherited 
genetic variation dictates that one line will produce tall 
plants and the other line will produce short plants, but the 
environment in which the individual plants are grown also 
has a significant influence on plant height. Environmental 
factors such as variations in water, light, soil nutrients, and 
temperature each influence plant growth. It is not hard to 
imagine that genetically identical plants of a type adapted 
to temperate zones might grow to different heights if one 
plant has an ideal growth environment while the other 
faces a hot, arid environment with poor soil. 

Phenotypic expression of genotypes can also depend 
on the interaction of genetically controlled developmental 
programs and external factors operating on organisms. 
For example, the seasonal change in coat color observed 
in arctic mammals that are nearly white in winter but 
have darker coats in spring and summer results from an 
interaction between numerous genes and external en- 
vironmental cues such as day length and temperature. 
Similarly, environmental cues that induce plants to bloom 
in the spring trigger changes in gene expression that 
stimulate the growth and development of multiple plant 
structures, including flowers and reproductive structures. 
Such capacities to make seasonal changes evolved by aid- 
ing the survival of these organisms, and they suggest that 
gene—environment interaction is pivotal in understanding 
and interpreting phenotypic variation. 


Environmental Modification to Prevent Hereditary 
Disease A prime example of gene—environment 
interaction in humans is actually a case of environmental 
intervention that is commonly practiced to prevent the 
development of the human autosomal recessive condition 


known as phenylketonuria (PKU). This case illustrates 
that the same alleles may produce different phenotypes in 
different environments. PKU is caused by the absence of 
the enzyme phenylalanine hydroxylase, which catalyzes the 
first step of the pathway that breaks down the amino acid 
phenylalanine, a common component of dietary protein. 

At one time, PKU accounted for thousands of cases of 
severe mental retardation every year. PKU occurred in 1 
out of 10,000 to 1 out of 20,000 newborns in most popula- 
tions around the world. Infants with PKU are normal at 
birth, but over the first several months of life the body’s 
inability to carry out the normal breakdown of phenylala- 
nine leads to the buildup of a compound that is toxic to 
developing neurons. As neurons die, mental and motor 
capacities are irretrievably lost, making full manifesta- 
tion of PKU inevitable. In the 1960s, a simple blood test 
became available to detect PKU in the first days of life. 
The test identifies the disease before the disease has had 
a chance to manifest itself and begin to damage the body. 
PKU was among the first, and is now one of dozens of rare 
hereditary disorders for which newborn infants are rou- 
tinely screened in U.S. hospitals. 

Given early detection, the key to preventing PKU, is 
the severe restriction of phenylalanine in the diet. Because 
phenylalanine is an amino acid and is a component of al- 
most all proteins, babies with PKU are given a diet consist- 
ing of specially selected and processed proteins that have 
had phenylalanine removed. An infant who is started on 
the phenylalanine-free diet soon after birth and kept on it 
through adolescence avoids the complications of PKU and 
will develop and function normally despite having PKU. 
Thousands of people with PKU are living fully normal and 
productive lives today, thanks to this simple environmental 
modification that prevents the expression of the devastating 
PKU phenotype. In this case, people who are homozygous 
recessive for the mutant PKU allele do not express the trait if 
they are raised in a largely phenylalanine-free environment. 

Dietary hazards abound for children and young adults 
with PKU, particularly in the form of the artificial sweet- 
ener known as aspartame. This sweetener is made by a 
chemical reaction that fuses the amino acids phenylalanine 
and aspartic acid to form a compound we perceive to taste 


sweet. Once consumed, aspartame is quickly broken down 
into its two constituent amino acids, and phenylalanine 
is released. Regular intake of aspartame is dangerous for 
those with PKU; for this reason, a dietary caution reading 
“Phenylketonurics: Contains phenylalanine” appears on the 
packaging of food products containing aspartame. Look for 
it on the next artificially sweetened product you pick up! 


Pleiotropic Genes 


Pleiotropy is the alteration of multiple, distinct traits of 
an organism by a mutation in a single gene. The impact of 
such mutations is, in reality, a reflection of the fact that all 
genes interact in one way or another with other genes. No 
gene acts alone in producing a phenotype. Rather, genes act 
in concert, each producing its own product and having its 
own effect, to produce a phenotype. Since all genes inter- 
act, it comes as no surprise that mutation of one gene has 
consequences for the expression of other genes and that the 
mutation of a single gene can have a large impact on phe- 
notype. Most mutations displaying pleiotropy do so either 
by altering the development of phenotypic features through 
the direct action of the mutant protein or as a secondary re- 
sult of a cascade of problems stemming from the mutation. 

Mendel unknowingly encountered a case of pleiotropy 
in his examination of pea plants. Two of the traits he con- 
sidered for his studies were the inheritance of purple versus 
white flower color (see Table 2.1) and the inheritance of 
a gray versus a white seed coat. Upon noticing that plants 
with white flowers invariably also have white seed coats, 
whereas purple-flowered plants always have gray seed coats, 
he correctly surmised that the inheritance of these traits had 
the same genetic basis. Today we know that flower color, 
seed-coat color, and the appearance of color at leaf axils 
(where the leaf attaches to the stem) result from the pro- 
duction of the purple pigment anthocyanin. Mutations that 
block anthocyanin production are pleiotropic because they 
leave several plant structures without color and produce 
mutant white phenotypes for multiple traits. 

Pleiotropy through the direct action of a mutant pro- 
tein product is frequently encountered in studies of de- 
velopment. One example is the activity of the Drosophila 
hormone called juvenile hormone (JH), which is active 
throughout the Drosophila life cycle and influences numer- 
ous attributes of development and reproduction. Increased 
production or increased activity of JH has been shown to 
prolong developmental time, decrease adult body size, pro- 
mote early sexual maturity, raise fecundity (the ability to 
produce offspring), and decrease life span. An evolutionary 
tradeoff is associated with changes in JH level or activity. 
On the one hand, producing more JH can lead to produc- 
tion of more offspring through earlier sexual maturity and 
higher fecundity. On the other hand, body size decreases 
and life span is shortened by increased JH activity. 

Pleiotropy in sickle cell disease (SCD) is an example 
of the phenotypically diverse secondary effects that can 
occur due to a mutant allele. SCD (OMIM 603903) is 
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an autosomal recessive condition caused by mutation of 
the B-globin gene that, in turn, affects the structure and 
function of hemoglobin, the main oxygen-carrying mol- 
ecule in red blood cells (see Chapter 10). Many of the red 
blood cells of people with SCD take on a sickle shape and 
cause numerous physical problems and complications 
(Figure 4.16). 


4.3 Gene Interaction Modifies 
Mendelian Ratios 


No gene operates alone to produce a phenotypic trait. 
Rather, genes work together to build the complex struc- 
tures and organ systems of plants and animals. What we 
see as a phenotype is the physical manifestation of the 
action of many genes that have each played a role and 
have worked in complex but coordinated ways to produce 
a trait or structure. At the cellular and molecular levels, 
the mutual reliance of genes on one another requires each 
gene to carry out its activity in the right place, at the right 
time, and at the appropriate level. 

Think of this process as analogous to a symphony 
orchestra playing a piece of classical music. The orchestra 
has many instruments and players, each with their own 
notes, tones, keys, and volume. If the players use their 
instruments as directed by the sheet music, the result will 
be smooth and harmonious. If, however, one player is off 
time or off key, the error might disrupt the entire perfor- 
mance. The same can be said of genes: Each must play its 
part correctly—that is, give a wild-type performance—or 
the integrity of the trait will be at risk. For example, the 
products of several genes interact in biosynthetic pathways 
to produce pigments that are responsible for flower color. 
Similarly, a complex phenotypic attribute like the ability to 
hear requires many genes to produce the various structures 
of the ear that convert acoustical vibrations into the electri- 
cal impulses that are transmitted to the brain and converted 
into what we perceive as sound. 

In this section, we look in detail at gene interaction, 
the collaboration of multiple genes in the production of a 
single phenotypic character or a group of related charac- 
teristics. First, however, let’s examine the genetic control 
of phenotypes from a perspective we have not yet explored. 


Gene Interaction in Pathways 


Genes commonly work together in pathways, multistep 
biochemical processes that operate either as biosyn- 
thetic pathways, synthesizing complex compounds such 
as amino acids, or as degradation pathways, breaking 
complex compounds down into simpler or elemental 
constituents. Biosynthetic pathways result from the ex- 
pression of genes whose products help build complex 
compounds or molecules that are the end product of the 
pathway. Through successive reaction steps that produce 
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Figure 4.16 Pleiotropy in sickle cell disease. The sickling of red blood cells has a range of phenotypic consequences. 


a series of intermediate compounds, these pathways— 
known broadly as anabolic pathways—lead ultimately 
to the production of an end product such as a pig- 
ment, amino acid, hormone, or nucleotide. The opposite 
process, the breakdown of compounds into intermedi- 
ate compounds and often into elemental constituents, is 
undertaken by catabolic pathways. 

Figure 4.17 gives an example of each type of pathway 
and shows that the expression of multiple genes is re- 
quired for the completion of any pathway. The anabolic 
pathway that synthesizes the amino acid methionine is 
shown in Figure 4.17a. Completion of this pathway, and 
thus the production of methionine, requires the expres- 
sion of four genes that each produce an enzyme catalyzing 
a distinct step of the pathway. Homozygosity for a mutant 


allele of any of these genes can block the pathway and 
would prevent methionine synthesis. 

The catabolic pathway that breaks down the amino 
acid phenylalanine is shown in Figure 4.17b. It, too, uti- 
lizes the enzymes produced by multiple genes. The figure 
identifies several steps of the pathway that are blocked 
by mutations of certain genes. Each of these mutations 
causes a distinct human hereditary disorder, including 
PKU that we just described. 

It is common for biologists to describe phenotypic 
characters or hereditary disorders such as those identified 
in Figure 4.17b as single-gene traits. This designation means 
that different forms of a trait can be transmitted to offspring 
by the segregation of alleles of a single gene. Phenotypic 
characteristics such as pea flower color and pea shape are 
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(a) In anabolic pathways the sequential action of gene products catalyzes steps of a biosynthetic pathway. 
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(b) The action of gene products in catabolic pathways breaks down complex compounds into simpler compounds. 
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Figure 4.17 Gene action in pathways. (a) In anabolic pathways the sequential action of gene 
products catalyzes steps of a biosynthetic pathway. (b) The action of gene products in catabolic path- 


ways breaks down complex compounds into simpler compounds 


examples of single-gene traits inherited as the result of 
allelic variation of a single gene, just as PKU is caused by 
inherited variation of the gene producing phenylalanine 
hydroxylase. 

The term single-gene trait conveniently summarizes the 
observation that inherited variation for one gene can pro- 
duce a mutant phenotype rather than a wild-type phenotype. 
The term is not, however, an accurate depiction of genetic 
reality. The anabolic and catabolic pathways illustrated in 
Figure 4.17 are representative of common forms of gene 
interaction. They reveal the necessity for several genes to 
work together to produce the wild-type phenotype for a trait. 
At the same time, they also show that the mutation of any 
of the participating genes could block or alter the wild-type 
phenotype. The mutant and wild-type phenotypes would 
segregate as single-gene traits, despite the involvement of 
multiple genes in producing those phenotypes. Similarly, the 


following example of Drosophila eye color illustrates that 
genes with a variety of functions contribute to production of 
the wild-type red eye color of Drosophila. 

Geneticists have identified many distinct mutant eye- 
color phenotypes in fruit flies, and these variants have 
been mapped to different genes. We will consider just 
three of these genes, two that produce different eye-color 
pigments, and a third that transports pigments to eye 
cells. The brown gene produces an enzyme that operates 
in a pathway synthesizing a vermilion-colored (bright red) 
pigment. The gene carries a dominant wild-type allele 
bw* and a recessive null mutant allele bw, and flies that 
are bwbw have brown-colored eyes. The gene is named 
after the mutant phenotype it is associated with. The ver- 
milion gene produces an enzyme that is active in a path- 
way synthesizing a brown pigment. The wild-type allele 
v* is dominant over the null mutant allele v. Flies that are 
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vv have vermilion-colored eyes. The white gene produces 
a pigment-transporting protein from the dominant allele 
w* that carries pigments to the eye. A mutant protein 
from the w allele is incapable of pigment transportation, 
and flies that do not produce the protein have white eyes. 
This is the X-linked w gene we discussed in Section 3.3. 
Production of wild-type proteins from all three genes is 
necessary to produce wild-type eye color, and hereditary eye 
color mutations result from the mutation of one or more of 
the genes (Figure 4.18). Wild-type eye color is the result of 
synthesis of brown and vermilion pigments and the transpor- 
tation of both pigments to eye cells, where they are blended. 
Mutation of any one or more of these genes results in a mu- 
tant phenotype. This example demonstrates that multiple 
genes are active in pathways determining different biological 
properties. Inherited variation of one gene can block a seg- 
ment of a pathway and produce a mutation attributable to 
a single gene, but such a finding does not negate the impor- 
tance of the action of multiple genes affecting each trait. 
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Drosophila. (a) Wild-type (red) eye color requires activity of 
three genes. (b) Mutation of any gene produces a distinctive 
mutant phenotype. (c) Double mutation of brown and vermilion 
produces white eyes. 


In addition to biosynthetic (anabolic) pathways and 
catabolic pathways, two additional types of pathways are 
frequently cited as examples featuring the interaction of 
multiple genes in the production of a trait or character- 
istic. Signal transduction pathways are responsible for 
reception of chemical signals, such as hormones, that are 
generated outside a cell and initiate a response inside a 
cell. Signal transduction operates through the release of a 
signaling molecule that is part of a sequence of steps cul- 
minating in the activation or repression of gene expres- 
sion in response to an intracellular or extracellular signal. 

Second, genes whose products make up developmen- 
tal pathways to direct normal growth, development, and 
differentiation of body parts and structures. Numerous 
developmental pathways have been identified in organ- 
isms, and the functions of their genes have been deter- 
mined by experimental analyses of mutant phenotypes. 
Geneticists use this analytic approach, known as genetic 
dissection, to identify the step-by-step events making up a 
genetic pathway. The use of genetic dissection to analyze 
a biosynthetic pathway is explored in the next section. 
Examples of signal transduction and developmental path- 
ways are presented in later discussions (see Chapter 20). 


The One Gene-One Enzyme Hypothesis 


The concept of pathways requiring gene action originated 
with Archibald Garrod’s suggestion in 1908 that the in- 
ability to produce the enzyme homogentisic acid oxidase 
is the cause of the human hereditary condition known 
as alkaptonuria (see Figure 4.17b). It was not until the 
middle of the 20th century, however, that comprehen- 
sive details of specific genetic pathways began to emerge. 
George Beadle and Edward Tatum were among the first 
to investigate biosynthetic pathways, in research that laid 
the groundwork for the later definition and examination 
of signal transduction and developmental pathways. 
Beadle and Tatum’s experiment studied growth vari- 
ants of the fungus Neurospora crassa, and its details are 
described in Experimental Insight 4.1. The idea behind their 
experiments was simple—to generate single-gene growth 
mutations in Neurospora and interpret the normal function 
of genes by observing the phenotypic consequences of their 
mutation. The famous hereditary proposal known as the 
one gene—one enzyme hypothesis came out of these ex- 
periments. It says that each gene produces an enzyme, and 
each enzyme has a specific functional role in a biosynthetic 
pathway. Beadle and Tatum observed that single-gene mu- 
tations block the completion of biosynthetic pathways and 
lead to the production of mutant fungi that are deficient 
in their ability to grow without specific nutritional supple- 
mentation. Their hypothesis proposed that each mutant 
phenotype was attributable to the loss or defective function 
of a specific enzyme. The consequence of these mutants was 
the blockage of a biosynthetic pathway and the absence of 
the end product of the pathway. Since each enzyme defect 
was inherited as a single-gene defect, the one gene—one 


enzyme hypothesis identifies the direct connection be- 
tween genes, proteins, and phenotypes. Two new terms that 
are used multiple times in this section are introduced in 
Experimental Insight 4.1. The term prototroph or protro- 
phic means “wild-type.” The word’s meaning derives from 
prototype, meaning “the original version.” In contrast, the 
term auxotroph or auxotrophic means “mutant.” 

The one gene—one enzyme concept has undergone 
adjustments since its proposal, to account for three ob- 
servations: (1) Some protein-producing genes do not 
produce enzymes, but produce transport proteins, struc- 
tural proteins, regulatory proteins, or other nonenzyme 
proteins; (2) some genes produce RNAs rather than 
proteins; and (3) some proteins (e.g., B-globin) must join 
with other proteins to acquire a function. Despite these 
modifications, Beadle and Tatum’s fundamental conclu- 
sion linking each gene to a particular product is valid and 
forms the basis for understanding of gene function. 


Experimental Insight 4.1 


The One Gene-One Enzyme Hypothesis 


George Beadle and Edward Tatum’s experiments had the goal of 
describing gene function. Their work took place at about the time 
DNA was being identified as the hereditary molecule, and more 
than a decade before DNA structure was identified. To provide 
information for analysis, Beadle and Tatum devised an experi- 
ment that would induce single-gene mutations in the filamen- 
tous fungus Neurospora crassa and then studied the mutants to 
determine how mutations altered Neurospora growth. Recall that 
Neurospora can grow as a haploid, or two haploid cells can fuse to 
form and grow as diploids that undergo meiosis (see Chapter 2). 


MUTATION PREPARATION 


To begin, Beadle and Tatum grew numerous genetically identical 
cultures of haploid wild-type fungi that were irradiated to induce 
random mutations @. The irradiated conidia (asexually produced 
fungal spores) were mated with wild-type haploids. The resulting 
diploids underwent meiosis to produce haploid spores that were 
grown in a two-step process to identify mutants. The diploids 
could also be tested to confirm the presence of a single-gene 
mutation by observation of a 3:1 ratio in their progeny. Irradiated 
haploid spores were grown first on a complete growth medium that 
contains a rich mixture of nutrients and supplements and is ca- 
pable of supporting the growth of wild-type and mutant fungi O. 
Next, growing fungi were picked from colonies on the com- 
plete medium and transferred to a minimal growth medium that 
supplies only the minimal constituents needed to support the 
growth of wild-type fungi ®. Mutant fungi are identified because 
they grow on complete medium containing many nutritional 
and other supplements that support the growth of wild-type as 
well as mutant fungi, but they are unable to grow on a minimal 
growth medium, which supplies only elemental constituents and 
supports the growth of wild-type fungi only. 


MUTATION ANALYSIS 
With numerous mutants in hand, Beadle and Tatum were 
able to address questions of which genes were mutated by 
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Genetic Dissection to Investigate Gene Action 


Beadle and Tatum’s experiments opened the way to inves- 
tigation of the roles of individual-gene mutations in bio- 
synthetic pathways. These investigations began with three 
assumptions about biosynthetic pathways that have proven 
to be correct: (1) Biosynthetic pathways consist of sequential 
steps, (2) completion of one step generates the substrate for 
the next step in the pathway, and (3) completion of every 
step is necessary for production of the end product of the 
pathway. These assumptions support the conclusion that 
wild-type strains are able to complete each pathway step, 
and that mutant strains are unable to complete a pathway 
because one or more pathway steps are blocked by mutation. 

Genetic dissection in this context is an experimen- 
tal approach that separately tests the ability of a mutant 
to execute each step of a biosynthetic pathway and as- 
sembles the steps of a pathway by determining the point 


first identifying the chemical category of the compound that 
cannot be produced and then determining the specific miss- 
ing compound. An example of this analysis is illustrated in 
steps @ and ®, where growth analysis tests a mutant for its 
ability to grow on various kinds of supplemented minimal 
media. These are growth media that have had one or more 
compounds added to them to support the growth of specific 
kinds of mutants. Step @ shows one mutant that grows only 
on medium that has been supplemented with all 20 of the 
common amino acids; this result indicates that the strain lacks 
the ability to synthesize one or more amino acids. The specific 
defect in this mutant strain is tested in step @ using 20 differ- 
ent supplemented minimal media, each supplemented with 
one amino acid. One mutant grows on minimal medium sup- 
plemented with methionine (met), thus identifying the strain 
as one that is unable to synthesize methionine. This strain is 
described as being met- (“met minus” or “methionine minus’), 
to identify the defective pathway as the one synthesizing me- 
thionine. The wild type is able to synthesize methionine and is 
identified as met+ (“met plus” or “methionine plus”). 


HYPOTHESIS OF GENE FUNCTION 


By testing hundreds of independent mutants in this way, 
Beadle and Tatum discovered that most mutants carried single 
mutations that could be overcome by supplementing minimal 
growth media with one particular compound. In the above 
case, supplementing a minimal medium with methionine sup- 
ports the growth of met- fungi. This finding led them to 
propose that single mutations prevented mutants from com- 
pleting a specific step of a biochemical pathway. Based on this 
outcome, they proposed that single-gene mutations altered 
the ability of mutants to produce one enzyme critical in a par- 
ticular biosynthetic pathway. The correlation between single- 
gene mutations and single defects in biosynthetic pathways is 
the basis of the one gene-one enzyme hypothesis. 
(Continued) 
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Experimental Insight 4.1 Continued 
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at which the pathway is blocked in each mutant. The 
strategy of genetic dissection is illustrated for a met- 
strain in Figure 4.19 using experimental data collected in 
1947 by Norman Horowitz on four independently isolated 
Neurospora crassa met- mutants. 

The goals of Horowitz’s genetic dissection analysis 
were to (1) determine the number of intermediate steps 
within the methionine biosynthetic pathway, (2) deter- 
mine the order of steps in the pathway, and (3) identify 
the step affected by each mutation. In designing his ex- 
periment, Horowitz relied on previous biochemical work 
identifying homoserine as the first compound in the me- 
thionine biosynthetic pathway and identifying cysteine, 
homocysteine, and cystathionine as later intermediates 
in the pathway. Horowitz tested the control prototroph 
(met+) and four methionine-requiring auxotrophs (Met 1 
to Met 4) for their ability to grow on (1) minimal medium, 
(2) minimal medium plus cysteine only, (3) minimal 
medium plus cystathionine only, (4) minimal medium 
plus homocysteine only, and (5) minimal medium plus 
methionine only. Figure 4.19a shows growth (+) or no 
growth (-) of the four met- mutants and the wild-type 
strain (mett+) on each of the experimental media. The 
wild-type strain grows on all media, since supplementa- 
tion of minimal medium with any of the intermediates has 
no effect on its growth. Each methionine mutant grows 
on minimal medium plus methionine, the end product of 
the biosynthetic pathway, but they show different growth 
patterns with other supplemented media. The following is 
an analysis of each mutant: 


1. Met 1 grows only on minimal medium plus methionine, 
thus indicating that a mutation in the last step of the 
pathway prevents conversion of the final intermediate 
product to methionine. Only the addition of methionine 
to minimal medium bypasses the pathway block. 

2. Met 2 exhibits growth with supplementation by ei- 
ther methionine or homocysteine, thus indicating a 
block at the step that produces homocysteine. This 
result also tells us that homocysteine is the substrate 
converted to methionine in the biosynthetic pathway. 


(a) Experimental data 


Growth Medium 
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3. Met 3 grows on minimal medium supplemented with 
either methionine, homocysteine, or cystathionine, 
but not on minimal medium plus cysteine. This tells 
us that Met 3 is blocked at the step that produces 
cystathionine and that cystathionine precedes homo- 
cysteine in the pathway. 

4, Met 4 grows with any supplementation of minimal 
medium. This tells us that Met 4 is defective at a step 
that precedes the production of cysteine. 


Figure 4.19b shows the steps of the biosynthetic pathway 
for methionine as determined by analysis of these mutants. 
The pathway step that is blocked in the mutant is identified 
based on the logic that supplementation by a compound 
needed after the blockage will permit growth, whereas add- 
ing a compound used before the blockage will not aid growth. 
The blocked step is also identified by the substance that 
accumulates in the auxotroph: In each mutant, a different 
intermediate substance builds up because the step that would 
convert it to the next intermediate in the pathway is defec- 
tive. Accumulation of cysteine by Met 3, cystathionine by 
Met 2, and homocysteine by Met 1 supports the assignment 
of these mutants to specific steps in the pathway. Genetic 
Analysis 4.2 illustrates genetic dissection of a biosynthetic 
pathway by assessment of the growth habits of auxotrophs. 


Epistasis and Its Results 


Genes contributing to different steps of a multistep anabolic 
or catabolic pathway or to a signal transduction or develop- 
mental pathway work together to produce the end product 
or outcome of the pathway. Because of this interaction, 
mutation of one gene may prevent completion of the path- 
way and production of the end product. In other words, gene 
interaction can result in one gene influencing whether and 
how other pathway genes are expressed or how they function. 

In this section, we describe simple gene interactions that 
occur in various ways to produce distinctive progeny pheno- 
type ratios as a result of the specific interaction mechanisms. 
These altered ratios of wild-type and mutant phenotypes 
are caused by epistasis or epistatic interactions, the name 


Figure 4.19 Genetic dissection of 
methionine biosynthesis pathway. 


Compound (a) Growth of a wild-type strain and 
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(b) Order of intermediates in pathway 


Met 4 Met 3 Met 2 


Met 1 


nine biosynthesis pathway and the step 
blocked in each met- mutant strain. 


Homoserine 4, Cysteine Ab Cystathionine En Homocysteine 4 Methionine 


GENETIC ANALYSIS 


PROBLEM Fourzmt bacterial mutants (zmt-1tozmt-4), Mutant Strain Added to Minimal Medium 
each with a single-gene mutation, are available for study. oo 
Five intermediates in the zmt-synthesis D F M R 5 Nothing zmt 
pathway have been identified (D, F, M, R, Wild type H + + 
and S), but their order in the pathway is not 
known. Each mutant is tested for its ability 
to grow on minimal medium supplement- 
ed with one of the intermediate compounds. All mutants zmt-3 - + 
grow when zmt is added to minimal medium, and the 
wild-type strain grows under all growth conditions tested. 
Find the order of intermediates in the zmt-synthesis path- BREAK IT DOWN: Growth on a supplemented minimal medium occurs ") 
way, and identify the step that is blocked in each mutant the medium provides a compound the mutant is unable to produce (p. 127). 
strain. In the growth table at right, “+” indicates growth 

and “—" indicates no growth. 


Solution Strategies Solution Steps 


BREAK IT DOWN: zmt 
is the pathway end product, 
and compounds D, F, M, R, S are 
intermediate compounds that 
precede zmt (p. 127). 


zmt-1 


zmt-2 


zmt-4 


Evaluate 

1. Identify the topic of this problem and 1. This problem deals with mutants of the zmt-synthesis pathway and requires an 
the kind of information the answer analysis of the defect in each mutant as well as ordering of the intermediates in 
should contain. the zmt-synthesis pathway. 

2. Identify the critical information given 2. The problem provides growth information for wild-type zmt* bacteria as well as 
in the problem. four zmt mutant strains when plated on minimal medium and media individually 

supplemented with zmt or one of five intermediates in the zmt-synthesis pathway. 

Deduce 

3. Compare and 3. All mutants grow with zmt supplementation and with supplementation by 
evaluate the compound S. None grows without any supplementation, and none obtains 
patterns of growth __|T!P:Asupplement that growth support from compound D. Compounds F, M, and R each support 

supports growth of all or most 

supported by the mutants is likely to be near growth of one or more mutants. 
supplements. the end of the pathway. 


4. Identify the final product of the path- 4. zmt is the last compound synthesized. Compound S also supports the growth of 


way and next-latest pathway inter- all mutants and is likely the immediate precursor of zmt. 
mediate compound. TIP: Asupplement supporting growth of the 
fewest mutants is likely at the beginning of 
Solve the pathway. 
5. Identify the first compound 5. Compound D does not support growth of any of the zmt mutants and likely 
synthesized in the pathway. occurs before any of the synthesis steps affected by mutations. Compound D is 


the first compound shown in the pathway. 
Identify the second, third, and fourth 6. Compound R supports the growth of only one mutant, zmt-2, indicating the com- 


compounds synthesized in the pound bypasses the step blocked in zmt-2. Compound R likely follows compound 
pathway. Din the pathway, and zmt-2 is defective in its ability to convert D to R. zmt-2 
pon Medium supplemented with an intermediate q grows on intermediate compounds that occur after its point of pathway block- 
pound that occurs after the pathway step that is blocked age, but not on compound D that comes before the zmt-2 blockage. 
by a mutation will support growth. 


Compound M supports growth of zmt-2 and zmt-4, bypassing the blockage in 
both mutants. Growth of zmt-4 is not supported by compounds D or R that occur 
before the conversion step blocked in zmt-4. The conclusion is that compound M 
follows R and that zmt-4 is unable to convert R to M. Compounds F, M, and S each 
support growth of zmt-4, so each bypasses the blockage. 

Compound F supports growth of zmt-3 and follows compound M in the pathway. 
zmt-3 is unable to convert M to F. Compound S supports new growth of zmt-1, 
indicating that it follows compound F in the pathway and that zmt-1 fails to con- 


TIP: To confirm this solution, verify that growth of each mutant 
is supported by supplementation with compounds that follow 
the blockage but not by supplementation with compounds that 


precede the blockage. 
vert compound F to S. 
7. Assemble the zmt-synthesis T) zmt-2 zmt-4 zmt-3 zmt-1 
pathway, and identify the mutants at D —> R —> M — F — S — zmt 


each pathway step. 


For more practice, see Problems 4, 18, and 19. Visit the Study Area to access study tools. MasteringGenetics™ 
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Figure 4.20 Patterns resulting from epistatic gene interaction. 


given to gene interactions in which an allele of one gene 
modifies or prevents the expression of alleles at another 
gene. A minimum of two genes are required for epistasis, 
and for the sake of simplicity, we limit the descriptions in 
this discussion to epistatic interactions between two genes. 
The genes that interact through epistasis are involved in 
producing a particular phenotypic characteristic, and they 
participate in the same pathway. For two interacting genes, 
epistasis is most readily detected among progeny of dihy- 
brid crosses where both genes carry dominant and recessive 
alleles. In these cases, independent assortment predicts a 
9:3:3:1 ratio of four phenotypes in the F progeny, but epis- 
tasis results in fewer than four phenotypes. This reduction in 
the number of F, phenotype classes occurs because different 
genotype classes have the same phenotype. In other words, 
the hallmark of epistatic interaction in a dihybrid cross is 
modification of the 9:3:3:1 ratio due to the combining of two 
or more genotype classes into a single phenotypic class. 

Epistasis results from mutation in pathways that require 
a specific activity from every gene in the pathway for the wild- 
type phenotype to be produced. Given the possible outcomes 
of dihybrid crosses, there are six ways the Fy phenotype 
proportions can be rearranged by epistasis. All six altered 
ratios have been seen in plants or animals. Figure 4.20 gives 
an overview of these patterns, showing the modification of 
dihybrid ratios that characterizes each form of epistasis. The 
remainder of this discussion provides a brief description and 
example of each of the epistatic patterns. First, however, we 
describe a dihybrid cross involving two genes contributing 
to feather color in budgerigar parakeets, popularly known as 
“budgies,” in which there is no interaction between the genes 
to alter the resulting 9:3:3:1 phenotypic ratio. 


No Interaction (9:3:3:1 ratio) Epistasis is most easily 
identified through specific deviations from the expected 
9:3:3:1 ratio among the F, progeny of a dihybrid cross 
involving dominant and recessive alleles. This expected F3 
ratio results from the action of two independently assorting 
genes in the absence of epistasis—that is, when the genes do 
not interact to change the expression of one or the other. 


The analysis begins with the mating of a pure-breeding 
blue budgie (BByy) to a pure-breeding yellow budgie (DbYY). 
The F; progeny have wild-type green feather color and are 
dihybrid (BbYy), and they are shown at the left in Figure 4.21. 
Progeny in the F2 generation shown in Figure 4.21 have four 
feather-color phenotypes, as predicted by independent as- 
sortment. Green feather color (wild type) is observed in % of 
the progeny, blue feathers and yellow feathers are each seen 
in ; of the Fy, and the white-feather phenotype appears in 
ig of the F, progeny. The 9:3:3:1 phenotypic ratio provides 
evidence that two independently assorting genes contribute 
to the feather-color phenotype. This ratio indicates that 
the genes are not undergoing epistatic interaction with one 
another. 

Six examples of epistatic interactions between two 
genes, each with a dominant and a recessive allele are 
shown in Foundation Figure 4.22, As we describe these 
patterns here, and as you examine Figure 4.22, notice that 
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Figure 4.21 No gene interaction in the production of 


feather color in budgerigar parakeets. A 9:3:3:1 ratio results 
from the independent assortment of alleles in a dihybrid cross 
of green-feathered budgies with the dihybrid genotype BbYy. 


FOUNDATION FIGURE 4.22 
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9:7 Example: sweet pea flower color 
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Dominant gene interaction occurs between genes that 
each contribute to a phenotype, producing one 
phenotype if dominant alleles are present at each gene, a 
second phenotype if recessive alleles are homozygous for 
either gene, and a third phenotype if recessive homozy- 
gosity occurs at both genes. 
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Example: bean flower color 15:1 
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Example: squash fruit shape 
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Recessive epistasis occurs when recessive 
alleles at one gene mask or reduce the 
expression of alleles at the interacting locus. 


© Dominant epistasis 
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In dominant epistasis, a dominant allele of one 
gene masks or reduces the expression of alleles 
of a second gene. 


© Dominant suppression 
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Example: labrador retriever coat color 


Dominant suppression occurs when the dominant allele 
of one gene suppresses the expression of a dominant 
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the phenotypic ratios observed for each trait result from 
the combining of the 9:3:3:1 genotype categories. (Refer to 
Figure 4.20 for an overview of these epistatic patterns.) 


Complementary Gene Interaction (9:7 ratio) William 
Bateson (the enthusiastic proponent of Mendelism) 
and Reginald Punnett (of Punnett square fame) were 
the first biologists to document a deviation from the 
expected 9:3:3:1 Fy progeny ratio of a dihybrid cross 
resulting from the epistatic interaction of two genes. 
In experiments conducted on sweet peas (Lathyrus 
odoratus), an ornamental plant different from Mendel’s 
edible pea (Pisum sativum), Bateson and Punnett began 
by crossing two pure-breeding white-flowered lines. The 
F, generation yielded a surprise—all of the progeny plants 
had purple flowers. When Bateson and Punnett crossed 
F, plants, the F produced a ratio of zx purple-flowered 
plants to % white-flowered plants. 

Bateson and Punnett recognized that their results could 
be explained if two genes interacted with one another to 
produce sweet pea flower color. Assuming two genes are 
responsible for a single pigment that gives the sweet pea 
flower its purple color, each parental line—represented by 
the genotypes ccPP and CCpp—is pure-breeding for white 
flowers as a result of homozygosity for recessive alleles at one 
of the genes. The cross of these two lines of pure-breeding 
white parents produces dihybrid purple-flowered F; plants— 
genotype CcPp—because the dominant allele at each locus 
enables completion of each step of the pathway leading to 
the synthesis of purple pigment. Independent assortment of 
alleles results in four genotypic classes, C-P-, ccP-, C—pp, 
and ccpp, produced in the 9:3:3:1 ratio that is expected from 
a dihybrid cross. Among the F,, however, only the 7% carry 
the C—P- genotype that confers the ability to produce purple 
pigment. The remaining ;& of the F, are homozygous either 
for one of the recessive alleles c and p or for both sets of al- 
leles. None of these plants are able to synthesize pigment, 
due to the absence of functional gene products from one or 
both loci, and they all have the same mutant phenotype. 

A 9:7 phenotypic ratio results from complementary 
gene interaction that requires genes to work in tandem 
to produce a single product. Figure 4.22 @ shows that at 
the molecular level, purple flower color in sweet peas 
is produced when the pigment anthocyanin is depos- 
ited in petals. The production of the purple-flowered 
F, progeny and the 9:7 F ratio is explained by the 
independent assortment of two genes, C and P, that 
produce gene products controlling different steps of 
the anthocyanin-synthesis pathway. Since anthocyanin 
production requires the action of the product of C as 
well as the product of P, both steps must be successfully 
completed for anthocyanin production and deposition 
in flower petals. On the other hand, any recessive ho- 
mozygous genotype at the C locus, the P locus, or both 
loci results in blockage of the pathway and production 
of white flowers containing no pigment. 


The ability of two mutants with the same mutant 
phenotype to produce progeny with the wild-type phe- 
notype is called genetic complementation, and it indicates 
that more than one gene is involved in determining the 
phenotype. We discuss the details of genetic complemen- 
tation in the last section of this chapter. 


Duplicate Gene Action (15:1 ratio) Two genes that 
duplicate one another’s activity constitute a redundant 
genetic system in which any genotype possessing at least 
one copy of a dominant allele at either locus will produce 
the dominant phenotype. Only when homozygous 
recessive mutant alleles are present at both loci does the 
recessive phenotype appear. The genes in a redundant 
system are said to have duplicate gene action; they 
either encode the same gene product, or they encode gene 
products that have the same effect in a single pathway or 
compensatory pathways. 

Figure 4.22 @ provides an illustration and explana- 
tion of duplicate gene action identified inadvertently by 
Gregor Mendel in an experiment involving flower color 
in bean plants. Near the end of his famous 1866 paper 
describing inheritance in peas, Mendel described an ex- 
periment with beans that began with the cross of a pure- 
breeding purple-flowered bean plant to a pure-breeding 
white-flowered bean plant. The F; plants all had purple 
flowers, and Mendel probably assumed that flower color 
determination in beans would follow the same pattern as 
in peas. Among the 32 F, plants Mendel produced, how- 
ever, 31 had purple flowers and only 1 had white flowers. 
Among the F, plants, #2 have a genotype containing 
at least one copy of either P or R, and only ;% have the 
genotype pprr and the white-flowered phenotype. 

Figure 4.22 @ shows that a dominant allele at ei- 
ther locus is capable of catalyzing the conversion of a 
precursor to anthocyanin and producing the dominant 
phenotype. Conversely, if homozygous recessive alleles 
are present at both loci, no functional gene product is 
produced, and the synthesis pathway is not completed. 
White flowers result from the absence of pigment in the 
i, of the F, progeny that are homozygous recessive for al- 
leles of both genes. 


Dominant Gene Interaction (9:6:1 ratio) Fruit shape in 
summer squash is classified as either long, spherical, or disk 
shaped. Plants that bear long fruit are consistently pure- 
breeding, indicating that these plants are homozygous for 
genes controlling fruit shape. On the other hand, plants 
producing disk-shaped fruit or spherical fruit are sometimes 
pure-breeding and sometimes not, indicating that plants 
producing disk-shaped or spherical fruit can be either 
homozygous or heterozygous for the genes controlling 
the trait. Figure 4.22 @ illustrates and describes dominant 
interaction between two genes controlling squash fruit 
shape. Dominant interaction is characterized by a 9:6:1 ratio 
of phenotypes in the progeny of a dihybrid cross. 


A cross of two pure-breeding plants producing spheri- 
cal fruit can generate F; that have disk-shaped fruit. This 
result indicates an interaction between genes controlling 
fruit shape and suggests that the F, disk-shape—producing 
plants are dihybrid. The F, progeny, which display 
the phenotypic proportions > disk, & spherical, and ; long, 
confirm that hypothesis. Which of the three phenotypes 
occurs depends on whether a dominant allele is present 
for both genes, one gene, or neither gene. In the Fy genera- 
tion, plants with at least one dominant allele at each locus 
(A-B-) have disk-shaped fruit, plants with recessive alleles 
at each locus (aabb) produce long fruit, and plants that are 
homozygous recessive at either of the loci (A—bb or aaB-) 
produce spherical fruit. 

The molecular model of the events underlying 
dominant interaction assumes that each gene produces 
a different protein that contributes to fruit shape. When 
dominant allelic action produces both proteins, disk- 
shaped fruit is generated. If only one of the proteins is 
produced, spherical fruit results, as for the genotypic 
classes aaB- and A-bb. Plants that are homozygous for 
recessive alleles of both genes (aabb) produce neither 
protein, and long fruit is the result. 


Recessive Epistasis (9:3:4 ratio) Black, chocolate, and 
yellow coat colors in Labrador retrievers result from the 
interaction of two genes, one that produces pigment and 
another that distributes the pigment to hair follicles. This 
form of gene interaction, in which homozygosity for a 
recessive allele at one locus can mask the phenotypic 
expression of a second gene, is called recessive epistasis 
and has the characteristic 9:3:4 ratio of phenotypes 
illustrated by Figure 4.22 @. 

Crossing pure-breeding chocolate parents to pure- 
breeding yellow ones produces F, progeny with black 
coats. That the F, progeny are dihybrid is revealed by 
the F, generation, in which & of the progeny carry the 
genotypes in the B-E- class and have black coats, 7% have 
a genotype that is bbE-, resulting in chocolate- colored 
coats, and ; carry genotypes that are either B-ee or bbee 
and have yellow coats. 

The molecular explanation for this genetic system is 
tied to production of the hair pigment melanin. Dogs can 
produce eumelanin that gives hair a black or brown color 
and pheomelanin that gives hair a reddish or yellow- 
ish tone. The E gene is TYRP1 that controls eumelanin 
distribution. The wild-type allele E yields full eumela- 
nin deposition, but allele e blocks deposition. Gene B is 
MCIR that controls eumelanin synthesis, with B pro- 
ducing a large amount of eumelanin that overwhelms 
the pheomelanin present to produce a black coat color. 
The alternative allele b produces a reduced amount of 
eumelanin. When mixed with pheomelanin in the coat, 
the resulting color is brown, sometimes called “choco- 
late.” Dogs that are B-E- produce, transport, and deposit 
large amounts of eumelanin and have black coats. Dogs 
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that are bbE_ produce less eumelanin due to their bb 
genotype and have chocolate (brown) coats. Dogs that 
are homozygous ee are unable to transport and deposit 
eumelanin and instead deposit only pheomelanin. These 
dogs have yellow coat color. 


Dominant Epistasis (12:3:1 ratio) Determination of 
fruit color in summer squashes provides an example 
of dominant epistasis, where a dominant allele of one 
gene blocks the expression of an allele of a second gene. 
Summer squash occur in three colors: white, yellow, 
and green. In Figure 4.22 ©, the cross of dihybrid WwYy 
(white) plants yields a 12:3:1 ratio of white:yellow:green 
plants. Plants with one or two copies of W—that is, W—Y- 
(9/16) and W-yy (3/16)—produce white squash due to 
the inhibition of conversion of the colorless precursor 
compound to green pigment. Plants that are homozygous 
ww are able to convert the colorless precursor to green 
pigment, and the dominant allele of the Y gene produces 
an enzyme that converts green pigment to yellow pigment. 
Homozygosity for the recessive allele (yy) leaves the green 
pigment unaltered and green squash are produced. Notice 
that in ww plants, segregation of Y-gene alleles in a cross 
of Yy monohybrids produces a 3:1 ratio of Y— (yellow) and 
yy (green) squash. This ratio can be seen by looking at 
plants that are wwY- (3) and wwyy (i) : 

At the molecular level, summer squash color pro- 
duction is a two-step biochemical process in which a 
colorless precursor is converted to a green intermediate 
by an enzyme produced in plants that are ww. In plants 
that are W-, however, the enzyme is not produced, and 
conversion of the precursor is blocked. Plants that are 
Y- produce a second enzyme to convert green pigment 
to yellow pigment, but those that are yy do not pro- 
duce the enzyme. If no green pigment is available, the 
squashes remain white, regardless of the genotype of 
the Y gene. 


Dominant Suppression (13:3 ratio) Our final example 
of epistatic gene interaction is dominant suppression, 
illustrated in Figure 4.22 @. Dominant suppression is 
similar to dominant epistasis but occurs when a dominant 
allele of one gene completely suppresses the phenotypic 
expression of alleles of another gene. In chickens, for 
example, feather color requires a dominant allele C. 
Chickens that are homozygous for a recessive allele c 
have white feathers. The C allele can have its color- 
producing action suppressed by a dominant suppressor 
allele, 7. The recessive allele i does not exert suppression. 
Crosses between pure-breeding colored chickens (CCii) 
and pure-breeding white chickens (cc/I) produce white- 
feathered F; that are dihybrid (Cc/i). Production of the F3 
results in a 13:3 ratio that is characteristic of dominant 
suppression. Chickens carrying a cc genotype are unable 
to produce feather color, and those carrying C- along 
with J- have feather color production suppressed. Only 
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chickens with the C-ii genotype are able to produce 
colored feathers. 

Figure 4.22 @ shows that the product of allele C 
converts a colorless precursor into pigment, whereas 
the allele c product is inactive and fails to convert the 
precursor, resulting in white feather color for cc geno- 
types. Dominant suppression of C by the product of 
I prevents pigment production in chickens with the 
C-I- genotype. The homozygous genotype ii is unable 
to suppress color in the C-. Genetic Analysis 4.3 tests 
your ability to analyze crosses involving epistatic gene 
interaction. 


4.4 Complementation Analysis 
Distinguishes Mutations in the Same 
Gene from Mutations in Different 
Genes 


Suppose you are a geneticist working in California, and 
you have identified a recessive mutation causing petu- 
nia flowers to be white rather than the wild-type purple 
color. A friend of yours, also a geneticist, is working on 
petunias in the Netherlands and contacts you because 
she has also identified a recessive mutation resulting in 
white-flowered petunias. Since there has been no con- 
tact between California petunias and Netherland petu- 
nias, the mutations have arisen independently. When 
geneticists encounter organisms with the same mutant 
phenotype, two initial questions are (1) do these organ- 
isms have mutations of the same gene or of different 
genes, and (2) how many genes are responsible for the 
mutations observed? 

We have already seen that mutations of different 
genes can produce the same, or very similar, abnor- 
mal phenotypes. This phenomenon is known as genetic 
heterogeneity. We have also seen that a mating of two 
organisms with the same or a similar abnormal pheno- 
type can sometimes produce offspring with the wild-type 
phenotype. This phenomenon is called genetic comple- 
mentation, and it occurs when mutant organisms carry 
mutations of different genes that produce the same ab- 
normal phenotype. In contrast, if the two mutations are 
in the same gene, offspring of a cross between the two 
mutants will have a mutant phenotype. This is the way 
pure-breeding mutants are perpetuated, since the parents 
and the offspring are all homozygous for a mutant allele 
of a gene. In the context of our discussion in this section, 
however, crossing two mutants and producing only mu- 
tant progeny is identified as a failure of genetic comple- 
mentation. In this section, we describe how to distinguish 
whether two independent mutations are in the same gene 
or in different genes. 


An analytic approach called genetic complementa- 
tion testing examines the relation between two or more 
recessive mutations affecting one phenotypic attribute. 
Researchers use it to determine whether two recessive 
mutations are in the same gene or in different genes. 
It also provides information on the number of differ- 
ent genes that can produce the mutant phenotype. Here 
we limit our discussion to testing eukaryotic genomes, 
using eye color in Drosophila as an example. Strategies 
for complementation testing in bacteria and bacterial 
viruses (bacteriophage) differ somewhat from those used 
in plants and animals (see discussion in Section 6.6). 

Genetic complementation testing crosses pure- 
breeding mutants for a recessive mutation and examines 
the phenotype of cross progeny. The heterozygous F 
progeny of these crosses are then examined for the wild- 
type or mutant phenotypes. If wild-type progeny are pro- 
duced, genetic complementation has occurred, and the 
conclusion is that the mutant alleles are of different genes. 
On the other hand, if the mutant alleles are of the same 
gene, the progeny of two pure-breeding mutants will have 
a mutant phenotype. This result indicates that no genetic 
complementation has taken place. 

As an example, we examine genetic complementation 
testing using two genes affecting Drosophila eye color, 
both of which we have discussed previously: the vermilion 
gene, whose product produces eye-color pigment, and the 
white gene, whose product produces the eye-color pig- 
ment transport protein. Both genes are located on the X 
chromosome in Drosophila. The sequential action of the 
gene products in eye-color production is illustrated in 
Figure 4.23a. Genetic complementation is illustrated by 
the production of wild-type (red) female progeny from the 
cross of a pure-breeding female with vermilion eyes to a 
pure-breeding male with white eyes. No genetic comple- 
mentation occurs when a pure-breeding apricot female 
and a pure-breeding buff male are crossed. All progeny 
have mutant eye colors. 

Genetic complementation analysis utilizes numer- 
ous crosses of different pure-breeding mutants to one 
another to determine if the progeny are mutant (no 
genetic complementation) or wild type (genetic comple- 
mentation). A table of genetic complementation testing 
data shown in Figure 4.23b indicates whether the cross of 
parental mutant phenotypes produces wild-type progeny 
(indicated in the table by plus symbols: +), or mutant 
progeny (indicated in the table by minus symbols: -). Any 
given pair of mutants that complement one another by 
producing wild-type progeny are mutations of different 
genes. (Recall the results of complementary gene action 
illustrated in Figure 4.22 @.) In contrast, the cross of 
mutant parents produces only the mutant phenotype in 
progeny when the mutations fail to complement one an- 
other and are mutations of the same gene. 


GENETIC ANALYSIS 


PROBLEM Dr. Ara B. Dopsis, a famous plant geneticist, decides to try his hand at iris propagation. He 
selects two pure-breeding irises, one red and the other blue, and crosses them. To his surprise, all F, 
plants have purple flowers. He decides to create more purple irises by self-fertilizing the F4 irises. 


BREAK IT DOWN: Neither 


red nor blue is dominant (p. 134), Dr. Dopsis produces 320 F, plants consisting of 182 with purple flowers, 59 with blue flowers, BREAK IT DOWN: Examine 
and 79 with red flowers. the ratio of progeny phenotypes 
pa ae ee eT AS `; ; K carefully to propose a mechanism 
a. From the information available, describe the genetic phenomenon that produces the phenotypic of inheritance (p. 133). 


ratio observed in the F, plants. Identify the number of genes that are involved in this trait. 
b. Using clearly defined symbols of your own choosing, identify the genotypes of parental and F, plants. 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic this problem addresses and 1. This problem concerns the interpretation of F, and F, result; it 
describe the nature of the required answer. requires identification of the genetic mechanism responsible 
for the observed results, and the assignment of genotypes to 
parental and F; plants in a manner consistent with the genetic 
mechanism. 
2. Identify the critical information given in the 2. The problem states that the blue- and red-flowered parents are 
problem. pure-breeding and that their F, are exclusively purple flowered. 
Among the F,, purple is predominant, but red and, to a lesser 
extent, blue are also observed. 
Deduce 
3. Deduce the potential genetic mechanisms that 3. Two potential mechanisms are suggested by these data. First, a 
could account for producing purple-flowered F, single gene with incomplete dominance might generate a pheno- 
plants from the pure-breeding red and blue pa- type in F4 heterozygous plants that is different from that of either 
rental plants. homozygous parent. Second, two genes displaying an epistatic 


interaction might account for a phenotype in an F, dihybrid that is 
distinct from either pure-breeding parent. 

4. Asingle-gene model predicts that the self-fertilization of an F4 
heterozygote will result in a 1:2:1 (25%:50%:25%) ratio in the 


TIP: Compare the relative 
percentages of each pheno- 
type to see which genetic 

model most closely predicts 


4. Determine the relative 
phenotype proportions 


predicted by the possible (the observed percentages. F». A two-gene epistasis model producing three F, phenotypes 
genetic mechanisms and could be dominant gene interaction (9:6:1 ratio), dominant epis- 
compare them to the tasis (12:3:1 ratio), or recessive epistasis (9:4:3 ratio). Recessive 
observed phenotype ratio. epistasis predictions are a closer match to the observations than 
dominant epistasis predictions. Recessive epistasis predicts phe- 
notype percentages of approximately 56%:25%: 19%. The ob- 
served ratio of F» phenotypes is $33 = 56.8% purple, 45 = 24.7% 
red, and 35 = 18.4% blue. 
Solve Answer a 
5. Identify the genetic mechanism most likely to 5. Comparison of the F, predictions of the single-gene incomplete 
account for the outcomes of these crosses. dominance model and the two-gene recessive epistasis model 


determines that recessive epistasis is a better match with the 

relative progeny proportions. The likely genetic model explain- 

ing these data is recessive epistasis. (Note that the number of 

Fə observed in each category can be compared to the number 

expected by chi-square analysis.) 

6. Assign genotypes to parental and F; plants. 6. Using symbols A and a for one gene and B and b for the second 
W Foundation Figure 4.22 identifies me] gene, the genotypes of plants are 


types associated with each phenotype. Parents: aaBB (red) and AAbb (bl ue) 


F,: AaBb (purple) 


TIP: See Foundation Figure 4.22 for 
the phenotype ratios characteristic of 
each type of epistatic interaction. 


For more practice, see Problems 5, 10, 22, and 31. Visit the Study Area to access study tools. MasteringGenetics™ 
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Figure 4.23 Genetic complementa- (a) Vermilion White 

tion and no genetic complementation gene gene 

involving the Drosophila eye color Precursor product My Pigment = Eye color 

genes vermilion and white. (a) The 

cross of pure-breeding vermilion to Genetic No genetic 

pure-breeding white shows genetic complementation complementation 

complementation by production of P: Qvw*/vw* x OviwlY P: Qviw'/v'w* x ovwlyY 

wild-type eye color in the F4. The cross Vermilion White Apricot Buff 

between pure-breeding apricot and 

pure-breeding buff produces no ge- F,: 9 vw*/v*w &vw+/Y F,: Q v*w"/v*w? oviw/Y 

netic complementation in the F4 that Wild type Vermilion Light apricot Apricot 

have mutant eye color. (b) Genetic (red) 

complementation testing among nine (b) 

distinct Drosophila eye color mutants 

reveals five complementation groups Mutation Apricot Brown Buff Carnation Cherry Claret Coral Vermilion White 

corresponding to five genes. Five mu- Apricot = + = m z į = + P 

tant alleles of white mutually fail to Brown - + + + + + + 

complement and are assigned to the Buff = + = + = F = 

same gene. The other four mutants Carnation = t T + F 7 

each complement one another, and the Cherry ~ + ~ = F 
r > Claret = 4: + x 

white gene mutants and are assigned Coral 2 J 2 

to their own gene. Vermilion 2 + 

White - 


Complementation 


Mutant (allele) 


| Apricot (w°), buff (w°), cherry (w), coral (w°), white (w) 


group 
II Carnation (c) 
lll Claret (cl) 
IV Brown (b) 
V Vermilion (v) 


Complementation analysis of the Drosophila eye- 
color mutation results displayed in Figure 4.23b focuses 
on crosses that fail to complement as these are the result 
of mutations that are in the same gene. Mutations that 
mutually fail to complement one another are identi- 
fied as a complementation group, consisting of one 
or more mutant alleles of a single gene. A complemen- 
tation group consists of mutants whose phenotypes 
consistently fail to complement one another and that 
complement mutants in other complementation groups. 
In the genetic context, a “complementation group” is 
synonymous with a “gene” because the mutant alleles of 
each complementation group all affect the same pheno- 
typic characteristic. Thus, in genetic complementation 
analysis, the number of complementation groups equals 
the number of genes. 

In the complementation testing data in Figure 4.23b, 
apricot, buff, cherry, coral, and white all exhibit a mutual 
failure to complement. This result identifies the five mu- 
tants as occurring in the same gene. (Historically, white 
was the first mutation identified and is the name the 


gene has become known by.) Geneticists conclude that 
apricot, buff, cherry, coral, and white are mutant alleles of 
the white (w) gene in Drosophila. These mutations form 
complementation group I. In contrast, the mutations 
brown, carnation, claret, and vermilion each complement 
all other mutants. This observation tells investigators that 
they are not alleles of another mutant, but that instead 
each mutant represents a separate gene. Each of these 
mutants forms its own complementation group (i.e., com- 
plementation groups II through V). Therefore, among the 
nine Drosophila eye-color mutants examined, five genes 
(five complementation groups) are identified. One gene is 
represented by five mutants, and the other four genes are 
represented by one mutation each. 

Genetic complementation analysis is an important 
tool of genetic analysis. The rare human cancer-prone 
disorder xeroderma pigmentosum (various OMIM des- 
ignations) can result from the inheritance of mutations 
from any seven genes that were originally identified by 
genetic complementation analysis. The following Case 
Study outlines this analysis. 


CASE STUDY 
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Complementation Groups in a Human Cancer-Prone Disorder 


In this case study, we examine the use of genetic complemen- 
tation analysis to identify the number of genes involved in a 
rare but genetically heterogeneous human condition called 
xeroderma pigmentosum (XP). XP is characterized by severe 
sensitivity to ultraviolet (UV) irradiation from sunlight and by 
up to a thousandfold increase in the rate of sun-induced skin 
cancer. While the experimental approaches to complementa- 
tion testing in humans are necessarily different from those 
employed for laboratory organisms, the interpretations of 
“crosses” follow the same processes. 

People with XP are deficient in a type of DNA repair called 
nucleotide excision repair (NER) that would otherwise protect 
their skin from the UV-induced damage that leads to cancer. In 
NER, a short section of DNA containing a UV-induced lesion is 
removed, and the gap is filled by new DNA (see Section 12.5). 


COMPLEMENTATION GROUPS Research work that be- 
gan in the late 1970s identified seven complementation 
groups representing seven different genes (each has its own 
OMIM designation) that are mutated in different forms of XP. 
Two approaches were successful in revealing some or all of 
these groups. Anthony Andrews and his colleagues obtained 
cultured skin cells from XP patients and from normal con- 
trols and tested the ability of the cells to grow after exposure 
to measured doses of UV irradiation (Figure 4.24). The cells 
were exposed to UV light at a wavelength of 254 nm for dif- 
ferent amounts of time, and their growth was measured as 
the percentage of original cells able to form colonies after 
UV exposure. These researchers identified five distinct pat- 
terns of response to UV exposure that are designated as 
complementation groups A to E. 

Other researchers measured the response of cultured XP 
cells to UV exposure by determining the level of NER taking 
place in XP cell cultures taken from different XP individuals in 
comparison to normal cells. The results showed that XP cell 
lines vary in their levels of NER from less than 5% of normal to 
about 50% of normal. These results could be due to the muta- 
tions being in different genes or, alternatively, to different 
hypomorphic alleles of the same gene. 

Genetic complementation analysis was used in the study 
of XP cell cultures with low NER to identify cell lineages car- 
rying different XP gene mutations. To do this, two cells from 
lineages with low NER were fused to form a heterokaryon, a 
hybrid cell with two nuclei. A heterokaryon contains all the 
genetic information from both contributing cells. The ex- 
perimental rationale is that if the two cells contain mutations 


SUMMARY 


4.1 Interactions between Alleles Produce 
Dominance Relationships 


Loss-of-function mutations decrease or eliminate gene 
activity. Gain-of-function mutations can cause over- 
expression or result in new functions. 


( MasteringGenetics™ 


of different genes, the heterokaryon will experience genetic 
complementation that would be detected as normal or near 
normal levels of NER; but if the mutations are in the same 
gene, NER will be about the same in the heterokaryon as 
in the individual cell lines. This analysis of NER levels in XP 
heterokaryons ultimately indicated seven complementation 
groups of XP genes. 


ASSOCIATED GENE FUNCTIONS Each of the seven 
XP-associated genes has had its function identified and its po- 
sition mapped in the human genome in the last decade or so. 
Four of the genes produce proteins that are required to remove 
a segment of the strand of DNA damaged by UV irradiation as 
part of the DNA repair process. Proteins from two other XP- 
associated genes are required to recognize UV-induced DNA 
damage, and the seventh gene produces a protein that binds 
to the DNA lesion once it is located. The knowledge of the iden- 
tity of the seven XP-associated genes has led to the finding that 
other cancer-associated hereditary diseases also involve muta- 
tions of one or another of the XP-associated genes. 


100 

= 

$ 10- 
© 
5 
© 
D 
[= 

Z 10- 
£ 
a 
Cc 
2 
S 

0.1 + 

Q 
Ó 
5 
5 
0.01 i 
0 1 2 3 4 5 6 7 
UV dose (J/m?) 


* Log scale 


Figure 4.24 Growth of cultured cells from patients with 
xeroderma pigmentosum (XP). Five XP complementation 
groups are identified based on growth ability. 


For activities, animations, and review quizzes, go to the Study Area. 


Incomplete dominance produces heterozygotes with phe- 
notypes that differ from those of either homozygote but are 
closer to one homozygous phenotype than the other. 


E Codominant alleles are both equally detected in the 
Heterozygous phenotype. 
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| The interaction of allelic products determines the 
dominance relationship between alleles. 


ABO blood types are produced by alleles whose protein 
products produce dominance or codominance depending 
on the genotype. 

f Multiple alleles of a single gene can display a variety 
of dominance relationships that establish an allelic 
series. 

E Lethal alleles can kill gametes, can prevent the gestational 
development of certain classes of progeny, or can have their 
lethal effect later in life. 

| Insex-limited and sex-influenced traits, alleles are mani- 
fested differently in each sex. 


4.2 Some Genes Produce Variable Phenotypes 


| In incomplete penetrance, an allele does not always have the 
expected effect on the phenotype. 

E In variable expressivity, organisms with the same genotype 
have different degrees of phenotypic expression. 

E Pleiotropic mutations affect two or more distinct and 
seemingly independent attributes of the phenotype. 


4.3 Gene Interaction Modifies Mendelian Ratios 


Ī Epistasis is revealed by six alternative ratios that are modifi- 
cations of the 9:3:3:1 ratio expected among the progeny of a 
dihybrid cross. 

| Epistasis types and their ratios are complementary gene 

interaction (9:7), duplicate gene action (15:1), dominant 

gene interaction (9:6:1), recessive epistasis (9:3:4), dominant 

epistasis (12:3:1), and dominant suppression (13:3). 


4.4 Complementation Analysis Distinguishes 
Mutations in the Same Gene from Mutations in 
Different Genes 


E In genetic heterogeneity, mutations in different genes can 
produce the same phenotype. 

E Genetic complementation produces progeny with the wild- 
type phenotype from parents that are pure-breeding for 
similar mutant phenotypes. The detection of genetic com- 
plementation means the mutations occur in different genes. 

| The failure to detect genetic complementation from the 
cross of two similar mutant organisms identifies the mutant 
alleles as being carried by the same gene. 
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PROBLEMS ( MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 
Chapter Concepts For answers to selected even-numbered problems, see Appendix: Answers. 


1. Define and distinguish incomplete penetrance and variable 
expressivity. 


2. Define and distinguish epistasis and pleiotropy. 


3. When working on barley plants, two researchers independ- 
ently identify a short-plant mutation and develop homozy- 
gous recessive lines of short plants. Careful measurements 
of the height of mutant short plants versus normal tall 
plants indicate that the two mutant lines have the same 


height. How would you determine if these two mutant lines 
carry mutation of the same gene or of different genes? 


4. Fifteen bacterial colonies growing on a complete medium 
are replica-plated to a minimal medium. Twelve of the 
colonies grow on minimal medium. 

a. Using terminology from the chapter, characterize the 12 
colonies that grow on minimal medium and the 3 colo- 
nies that do not. 


b. The three colonies that do not grow on minimal me- 
dium are replica-plated to minimal medium plus the 
amino acid serine (min + Ser), and all three colonies 
grow. Characterize these three colonies. 

c. The serine biosynthetic pathway is a three-step pathway 
in which each step is catalyzed by the enzyme product 
of a different gene, identified as enzymes A, B, and C in 
the diagram below. 


3-Phosphoglycerate aa 3-Phospho-hydroxypyruvate Ename B 7. 
(3-PHP) 
3-Phosphoserine SE Goria 
(3-PS) (Ser) 


Mutant 1 grows only on min + Ser. In addition to growth 

on min + Ser, mutant 2 also grows on min + 3-PHP and 

min + 3-PS. Mutant 3 grows on min + 3-PS and min + 

Ser. Identify the step of the serine biosynthesis pathway at 

which each mutant is defective. 8. 


In a type of parakeet known as a “budgie,” feather color is 

controlled by two genes. A yellow pigment is synthesized 

under the control of a dominant allele Y. Budgies that are 
homozygous for the recessive y allele do not synthesize yel- 
low pigment. At an independently assorting gene, the dom- 
inant allele B directs synthesis of a blue pigment. Recessive 
homozygotes with the bb genotype do not produce blue 
pigment. Budgies that produce both yellow and blue pig- 
ments have green feathers; those that produce only yellow 
pigment or only blue pigment have yellow or blue feathers, 
respectively; and budgies that produce neither pigment are 
white (albino). 

a. List the genotypes for green, yellow, blue, and albino 
budgies. 

b. A cross is made between a pure-breeding green budgie 
and a pure-breeding albino budgie. What are the geno- 
types of the parent birds? 

c. What are the genotype(s) and phenotype(s) of the F; 
progeny of the cross described in part (b)? 9. 

d. If F; males and females are mated, what phenotypes are 
expected in the F, and in what proportions? 

e. The cross of a green budgie and a yellow budgie pro- 
duces offspring that are 12 green, 4 blue, 13 yellow, and 
3 albino. What are the genotypes of the parents? 


The ABO and MN blood groups are given below for four 
sets of parents (1 to 4) and four children (a to d). Recall that 
the ABO blood group has three alleles: /, IP, and i. The 
MN blood group has two codominant alleles, M and N. 
Using your knowledge of these genetic systems, match 
each child with every set of parents who might have 
conceived the child, and exclude any parental set that 
could not have conceived the child. 


Mother Father 
ABO MN ABO MN 
10. 
1 O M B M 
2 B N B N 
3 AB MN B MN 
4 A N B MN 


Problems 139 
Children 
ABO MN 
a B M 
b o M 
c AB MN 
d B N 


The wild-type color of horned beetles is black, although 
other colors are known. A black horned beetle from a 
pure-breeding strain is crossed to a pure-breeding green 
female beetle. All of their F; progeny are black. These F; are 
allowed to mate at random with one another, and 320 F, 
beetles are produced. The F, consists of 179 black, 81 green, 
and 60 brown. Use these data to explain the genetics of 
horned beetle color. 


Two genes interact to produce various phenotypic ratios 
among F, progeny of a dihybrid cross. Design a different 
pathway explaining each of the F, ratios below, using hypo- 
thetical genes R and T and assuming that the dominant al- 
lele at each locus catalyzes a different reaction or performs 
an action leading to pigment production. The recessive 
allele at each locus is null (loss-of-function). Begin each 
pathway with a colorless precursor that produces a white 
or albino phenotype if it is unmodified. The ratios are for 
F, progeny produced by crossing wild-type F; organisms 
with the genotype RrTt. 


i; dark blue : © light blue : ;; white 


122 E: 1 
16 White : ïg green : ïg yellow 


g green : jx yellow: ji blue : ț white 
% red: & white 
£ black : 75 white 


x black : % gray : & albino 


memoean sp 


ig white : $ green 

The ABO blood group assorts independently of the Rhesus 

(Rh) blood group and the MN blood group. Three alleles, 

r P, and i, occur at the ABO locus. Two alleles, R, a domi- 

nant allele producing Rh+, and r, a recessive allele for Rh-, 

are found at the R} locus, and codominant alleles M and N 

occur at the MN locus. Each gene is autosomal. 

a. A child with blood types A, Rh-, and M is born toa 
woman who has blood types O, Rh-, and MN anda 
man who has blood types A, Rh+, and M. Determine 
the genotypes of each parent. 

b. What proportion of children born to a man with geno- 
type 48 Rr MN and a woman who is Mi Rr NN will 
have blood types B, Rh-, and MN? Show your work. 

c. Aman with blood types B, Rh+, and N says he could 

not be the father of a child with blood types O, Rh-, 

and MN. The mother of the child has blood types A, 

Rh+, and MN. Is the man correct? Explain. 


In rats, gene B produces black coat color if the genotype is 
B-, but black pigment is not produced if the genotype is 
bb. At an independent locus, gene D produces yellow pig- 
ment if the genotype is D-, but no pigment is produced 
when the genotype is dd. Production of both pigments 
results in brown coat color. If neither pigment is produced, 
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coat color is cream. Determine the genotypes of parents of 
litters with the following phenotype distributions. 

a. 4 brown, 4 black, 4 yellow, 4 cream 

b. 3 brown, 3 yellow, 1 black, 1 cream 

c. 9 black, 7 brown 


11. Inthe rats identified in Problem 10, a third independently 
assorting gene involved in determination of coat color 
in rats is the C gene. At this locus, the genotype C- per- 
mits expression of pigment from genes B and D. The cc 
genotype, however, prevents expression of coat color and 
results in albino rats. For each of the following crosses, de- 
termine the expected phenotype ratio of progeny. 
a. BbDDCc X BbDdCc 
b. BBDdcc X BbddCc 
c. bbDDCc X BBddCc 
d. BbDdCC X BbDdCC 


12. Using the information provided in Problems 10 and 11, 
determine the genotype and phenotype of parents that 
produce the following progeny: 


9 3 4 on: 
a. ïg brown : 7 black : 7¢ albino 


b. $ black : 3 cream: 2 albino 


; 9 3 

G; ar brown: ra albino: a yellow : & black : G cream 
3 1 

d. ł brown: z yellow 


13. Total cholesterol in blood is reported as the number of mil- 
ligrams (mg) of cholesterol per 100 milliliters (mL) of blood. 
The normal range is 180-220 mg/100 mL. A gene mutation 
altering the function of cell-surface cholesterol receptors 
restricts the ability of cells to collect cholesterol from blood 
and draw it into cells. This defect results in elevated blood 
cholesterol levels. Individuals who are heterozygous for a 
mutant allele and a wild-type allele have levels of 300—600 
mg/100 mL, and those who are homozygous for the mu- 
tation have levels of 800-1000 mg/100 mL. Identify the 


Application and Integration 


17. The coat color in mink is controlled by two codominant 
alleles at a single locus. Red coat color is produced by the 
genotype R,R;, silver coat by the genotype RR, and plati- 
num color by RR». White spotting of the coat is a recessive 
trait found with the genotype ss. Solid coat color is found 
with the S— genotype. 

a. What are the expected progeny phenotypes and pro- 
portions for the cross SsR1Ry X ssR2Ry? 

b. Ifthe cross SsRıR} X SsR Rj is made, what are the 
progeny phenotypes, and in what proportions are they 
expected to occur? 

c. Two crosses are made between mink. Cross 1 is the 
cross of a solid, silver mink to one that is solid, plati- 
num. Cross 2 is between a spotted, silver mink and one 
that is solid, silver. The progeny are described in the 
table below. Use these data to determine the genotypes 
of the parents in each cross. 


Cross Offspring 
Spotted, Spotted, Spotted, Solid, Solid, Solid, 
platinum silver red platinum silver red 
1 2 3 0 6 5 0 
2 3 i; 2 4 5 3 
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genetic term that best describes the inheritance of this form 
of elevated cholesterol level, and justify your choice. 


14. Flower color in snapdragons results from the amount of 
the pigment anthocyanin in the petals. Red flowers are 
produced by plants that have full anthocyanin production, 
and ivory-colored flowers are produced by plants that lack 
the ability to produce anthocyanin. The allele An1 has full 
activity in anthocyanin production, and the allele An2 is a 
null allele. Dr. Ara B. Dopsis, a famous genetic researcher, 
crosses pure-breeding red snapdragons to pure-breeding 
ivory snapdragons and produces F} progeny plants that 
have pink flowers. He proposes that this outcome is the re- 
sult of incomplete dominance, and he crosses the F; to test 
his hypothesis. What phenotypes does Dr. Dopsis predict 
will be found in the F>, and in what proportions? 


15. A plant line with reduced fertility comes to the attention of 
a plant breeder who observes that seed pods often contain a 
mixture of viable seeds that can be planted to produce new 
plants, and withered seeds that cannot be sprouted. The 
breeder examines numerous seed pods in the reduced fertil- 
ity line and counts 622 viable seeds and 204 nonviable seeds. 


a. What single-gene mechanism best explains the breeder’s 
observation? 

b. Propose an additional experiment to test the genetic 
mechanism you propose. If your hypothesis is correct, 
what experimental outcome do you predict? 


16. Incattle, an autosomal mutation called Dexter produces 
calves with short stature and short limbs. Embryos that are 
homozygous for the Dexter mutation have severely stunted 
development and either spontaneously abort or are still- 
born. What progeny phenotypes do you expect from the 
cross of two Dexter cows? What are the expected propor- 
tions of the expected phenotypes? 


For answers to selected even-numbered problems, see Appendix: Answers. 


18. Strains of petunias come in four pure-breeding colors: 
white, blue, red, and purple. White petunias are produced 
when plants synthesize no flower pigment. Blue petunias 
and red petunias are produced when plants synthesize 
blue or red pigment only. Purple petunias are produced 
in plants that synthesize both red and blue pigment. The 
mixture of red and blue makes purple. Flower-color pig- 
ments are synthesized by gene action in two separate 
pigment-producing biochemical pathways. Pathway I 
contains gene A that produces an enzyme to catalyze con- 
version of a colorless pigment designated white, to blue 
pigment. In Pathway II, the enzymatic product of gene B 
converts the colorless pigment designated white, to red 
pigment. The two genes assort independently. 


gene A 
PathwayI: White; —— Blue 
+ = Purple 
Pathway II: White, —> Red 
gene B 


a. What are the possible genotype(s) for pure-breeding 
red petunias? 

b. What are the possible genotype(s) for true-breeding 
blue petunias? 
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c. True-breeding red petunias are crossed to pure- 
breeding blue petunias, and all the F; progeny have 
purple flowers. If the F} are allowed to self-fertilize 
and produce the F,, what is the expected pheno- 
typic distribution of the F progeny? Show your work. 


Feather color in parakeets is produced by the blending of 
pigments produced from two biosynthetic pathways shown 
below. Four independently assorting genes (A, B, C, and D) 
produce enzymes that catalyze separate steps of the path- 
ways. For the questions below, use an uppercase letter to 
indicate a dominant allele producing full enzymatic activity 
and a lowercase letter to indicate a recessive allele produc- 
ing no functional enzyme. Feather colors produced by mix- 
ing pigments are green (yellow + blue) and purple (red + 
blue). Red, yellow, and blue feathers result from production 
of one colored pigment, and white results from absence of 
pigment production. 


Enzyme A Enzyme B 
Pathway I: Compound I —> Compound II —> Compound III 
(colorless) (red) (yellow) 
Enzyme C Enzyme D 
Pathway II: Compound X ——> Compound Y ——> Compound Z 
(colorless) (colorless) (blue) 


a. What is the genotype of a pure-breeding purple para- 
keet strain? 

b. What is the genotype of a pure-breeding yellow strain 
of parakeet? 

c. Ifa pure-breeding blue strain of parakeet (aa BB CC 
DD) is crossed to one that is pure-breeding purple, pre- 
dict the genotype(s) and phenotype(s) of the F;. Show 
your work, 

d. IfF, birds identified in part (c) are mated at random, 
what phenotypes do you expect in the F, generation? 
What are the ratios among phenotypes? Show 
your work. 


20. Brachydactyly type D is a human autosomal dominant 


21. 


condition in which the thumbs are abnormally short and 
broad. In most cases, both thumbs are affected, but oc- 
casionally just one thumb is involved. The accompanying 
pedigree shows a family in which brachydactyly type D is 
segregating. Filled circles and squares represent females 
and males who have involvement of both thumbs. Half- 
filled symbols represent family members with just one 
thumb affected. 


le? 
3d 56 e708 
10 11 


u olenes ee 
vig? LERE bode 


a. Is there any evidence of variable expressivity in this 
family? Explain. 

b. Is there evidence of incomplete penetrance in this fam- 
ily? Explain. 


A male and a female mouse are each from pure-breeding 
albino strains. They have a litter of 10 pups, all of which 
have normal pigmentation. The F, pups are crossed to one 
another to produce 56 F, mice, of which 31 are normally 
pigmented and 25 are albino. 


22. 


23. 


Parental Strain 
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a. Using clearly defined allele symbols of your own choosing, 
give the genotypes of parental and F, mice. What genetic 
phenomenon explains these parental and F, phenotypes? 

b. What genetic phenomenon explains the F results? Use 
your allelic symbols to explain the F, results. 


Xeroderma pigmentosum (XP) is an autosomal recessive 
condition characterized by moderate to severe sensitivity to 
ultraviolet (UV) light. Patients develop multiple skin lesions 
on UV-exposed skin, and skin cancers often develop as a re- 
sult. XP is caused by deficient repair of DNA damage from 
UV exposure. 


a. Many genes are known to be involved in repair of UV- 
induced DNA damage, and several of these genes are 
implicated in XP. What genetic phenomenon is illus- 
trated by XP? 

b. A series of 10 skin-cell lines was grown from different 
XP patients. Cells from these lines were fused, and the 
heterokaryons were tested for genetic complementation 
by assaying their ability to repair DNA damage caused by 
a moderate amount of UV exposure. In the table below, + 
indicates that the fusion cell line performs normal DNA 
damage mutation repair, and — indicates defective DNA 
repair. Use this information to determine how many 
DNA-repair genes are mutated in the 10 cell lines, and 
identify which cell lines share the same mutated genes. 


= 
L 


| # |= 
g(= le] = 
ale |e] [= 
ge | al ae el = 
E 6lelelelalels 
mie ol al [ll 
sj+}+f4+)-]+]4]4]- 
a ate | ste) el) el] el ee em 
10/+]/-|+]/+]/-/-|4+]4+]+]- 
12345678910 
Mutant 


Three strains of green-seeded lentil plants appear to have 
the same phenotype. The strains are designated G}, G, and 
G3. Each green-seeded strain is crossed to a pure-breeding 
yellow-seeded strain designated Y. The F; of each cross 

are yellow; however, self-fertilization of F; plants produces 
F, with different proportions of yellow- and green-seeded 
plants as shown below. 


F, Phenotype F> Phenotype 


Green Yellow Green Yellow 
G, Y All yellow 1 3 
G2 Y All yellow i Z 
€ NE All yellow. z az 


a. For what number of genes are variable alleles segregat- 
ing in the G; X Y cross? The Gz X Y cross? In the G3 X Y 
cross? Explain your rationale for each answer. 

b. Using the allele symbols A and a, B and b, and D and d 
to represent alleles at segregating genes, give the geno- 
types of parental and F; plants in each cross. 
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c. For each set of F progeny, provide a genetic explana- 
tion for the yellow: green ratio. What are the genotypes 
of yellow and green F; lentil plants in the G, X Y cross? 

d. If green-seeded strains G} and G; are crossed, what are 
the phenotype and the genotype of F; progeny? 

e. What proportion of the F, are expected to be green? 
Show your work. 

f. Ifstrains G, and G3 are crossed, what will be the pheno- 
type of the F? 

g. What proportion of the F, will have yellow seeds? Show 
your work. 


Blue flower color is produced in a species of morning glo- 
ries when dominant alleles are present at two gene loci, A 
and B. (Plants with the genotype A-B- have blue flowers.) 
Purple flowers result when a dominant allele is present at 
only one of the two gene loci, A or B. (Plants with the gen- 
otypes A—bb and aaB- are purple.) Flowers are red when 
the plant is homozygous recessive for each gene (i.e., aabb). 
a. Two pure-breeding purple strains are crossed, and all 
the F, plants have blue flowers. What are the genotypes 
of the parental plants? 

b. If two F; plants are crossed, what are the expected phe- 
notypes and frequencies in the F3? 

c. Ifan F; plant is backcrossed to one of the pure-breeding 
parental plants, what is the expected ratio of phenotypes 
among progeny? Why is the phenotype ratio the same 
regardless of which parental strain is selected for the 
backcross? 


The following crosses are performed between morning 
glories whose flower color is determined as described in 
Problem 24. Use the segregation data to determine the 
genotype of each parental plant. 


Parental Phenotypes 
a. blue X blue 


Offspring Phenotypes 
3 blue : $ purple 

b. purple X purple l blue : $ purple: 
c. blue x red z blue :} purple : 
d. purple X red purple: >red 


e. blue x purple $ blue : £ purple: 4 red 


26. Two pure-breeding strains of summer squash producing 
yellow fruit, Yı and Y,, are each crossed to a pure-breeding 
strain of summer squash producing green fruit, G4, and to 
one another. The following results are obtained: 

Cross P F, F> 

l Yı (yellow) All yellow 3 yellow : i green 
X G; (green) 

Il Y> (yellow) All green 2 green: i yellow 
x G4 (green) 

Ill Yı (yellow) All yellow B yellow : 5 green 
X Y> (yellow) 


a. Examine the results of each cross and predict how many 
genes are responsible for fruit-color determination in 
summer squash. Justify your answer. 

b. Using clearly defined symbols of your choice, give the 
genotypes of parental, F4, and F, plants in each cross. 

c. Ifthe F; of Crosses I and II are mated, predict the phe- 
notype ratio of the progeny. 


27. 


28. 
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Marfan syndrome is an autosomal dominant disorder in hu- 
mans. It results from mutation of the gene on chromosome 15, 
that produces the connective tissue protein fibrillin. In its wild- 
type form, fibrillin gives connective tissues, such as cartilage, 
elasticity. When mutated, however, fibrillin is rigid and pro- 
duces a range of phenotypic complications, including excessive 
growth of the long bones of the leg and arm, sunken chest, 
dislocation of the lens of the eye, and susceptibility to aortic 
aneurysm, which can lead to sudden death in some cases. 

Different sets of symptoms are seen among various 
family members, as shown in the pedigree below. Each 
quadrant of the circles and squares represents a different 
symptom, as the key indicates. 


"To, 
= L-ko AOR I TE 


Long bones 
Lens dislocation 


OO 


Sunken chest 


Aortic aneurysm 


Since all cases of Marfan syndrome are caused by mutation 
of the fibrillin gene, and all family members with Marfan 
syndrome carry the same mutant allele, how do you explain 
the differences shown in the pedigree? 


Yeast are single-celled eukaryotic organisms that grow in cul- 
ture as either haploids or diploids. Diploid yeast are generated 
when two haploid strains fuse together. Seven haploid strains 
of yeast exhibit similar growth habit: At 25°C, each strain 
grows normally, but at 37°C, they show different growth ca- 
pabilities. The table below displays the growth pattern. 
Strain growth 
ABCDEF G 
25CC00 00000 
3cO®0 00000 


@ Normal growth 
© Slow growth 
O No growth 


a. Describe the nature of the mutation affecting each of these 
mutant yeast strains. Explain why strains B and G display 
different growth habit at 37°C than the other strains. 

b. Each of the mutant pairs of haploid yeast is fused, and the 
resulting diploids are tested for their ability to grow at 37°C. 
The results of the growth experiment are shown below. 


37°C growth data 


( JO) 
O 


FO © © 
TEREE) 


How many different genes are mutated among these seven 
yeast strains? Identify the strains that represent each gene 
mutation. 
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During your work as a laboratory assistant in the research fa- 
cilities of Dr. O. Sophila, a world-famous geneticist, you come 
across an unusual bottle of fruit flies. All the flies in the bottle 
appear normal when they are in an incubator set at 22°C. 
When they are moved to a 30°C incubator, however, a few 
of the flies slowly become paralyzed; and after about 20 to 30 
minutes, they are unable to move. Returning the flies to 22°C 
restores their ability to move after about 30 to 45 minutes. 
With Dr. Sophila’s encouragement, you set up 10 in- 
dividual crosses between single male and female flies that 
exhibit the unusual behavior. Among 812 progeny, 598 
exhibit the unusual behavior and 214 do not. When you 
leave one of the test bottles in the 30°C incubator too long, 
you discover that more than 2 hours at high temperature 
kills the paralyzed flies. When you tell this to Dr. Sophila, 
he says, “Ah ha! I know the genetic explanation for this 
condition.” What is his explanation? 


Dr. Ara B. Dopsis and Dr. C. Ellie Gans are performing 
genetic crosses on daisy plants. They self-fertilize a blue- 
flowered daisy and grow 100 progeny plants that consist of 
55 blue-flowered plants, 22 purple-flowered plants, and 23 
white-flowered plants. Dr. Dopsis believes this is the result of 
segregation of two alleles at one locus and that the progeny 
ratio is 1:2:1. Dr. Gans thinks the progeny phenotypes are the 
result of two epistatic genes and that the ratio is 9:3:4. 

The two scientists ask you to resolve their conflict by 
performing chi-square analysis on the data for both pro- 
posed genetic mechanisms. For each proposed mechanism, 
fill in the values requested on the form the researchers 
have provided for your analysis. 


a. Use the form below to calculate chi square for the 1:2:1 
hypothesis of Dr. Sophila. 


Phenotype Observed Expected 
Blue 55 
l Purple 22 
White 23 
Chi-square value: df: p value > 


b. Use the form below to calculate chi square for the 9:3:4 
hypothesis of Dr. Gans. 


Phenotype Observed Expected 
Blue 55 

Purple 22 

White 23) 

Chi-square value: df: p value > 
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c. What is your conclusion regarding these two genetic 
hypotheses? 

d. Using any of the 100 progeny plants, propose a cross 
that will verify the conclusion you proposed in part (c). 
Plants may be self-fertilized, or one plant can be crossed 
to another. What result will be consistent with the 1:2:1 
hypothesis? What result will be consistent with the 9:3:4 
hypothesis? 


Human ABO blood type is determined by three alleles, two 
of which (7^ and J) produce gene products that modify the 
H antigen produced by protein activity of an independently 


32. 
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assorting H gene. A rare abnormality known as the 
“Bombay phenotype” is the result of epistatic interaction 
between the gene for the ABO blood group and the H gene. 
Individuals with the Bombay phenotype appear to have 
blood type O based on the inability of both anti-A antibody 
and anti-B antibody to detect an antigen. The apparent 
blood type O in Bombay phenotype is due to the absence of 
H antigen as a result of homozygous recessive mutations of 
the H gene. Individuals with the Bombay phenotype have 
the hh genotype. Use the information above to make pre- 
dictions about the outcome of the cross shown below. 
I4PPHh Xx IAT?Hh 
In rabbits, albinism is an autosomal recessive condition 
caused by the absence of the pigment melanin from skin and 
fur. Pigmentation is a dominant wild-type trait. Three pure- 
breeding strains of albino rabbits, identified as strains 1, 2, 
and 3, are crossed to one another. In the table below, F and 
F, progeny are shown for each cross. Based on the available 
data, propose a genetic explanation for the results. As part 
of your answer, create genotypes for each albino strain using 
clearly defined symbols of your own choosing. Use your sym- 
bols to diagram each cross, giving the F; and F genotypes. 


Cross F, Progeny F2 Progeny 
Cross A strain 1 56 albino 192 albino 
x strain 2 
Cross B strain 1 72 pigmented 181 pigmented, 
X strain 3 139 albino 
Cross C strain 2 34 pigmented 89 pigmented, 
x strain 3 72 albino 
33. Dr. O. Sophila, a close friend of Dr. Ara B. Dopsis, reviews 


the F» results Dr. Dopsis obtained in his experiment with 

iris plants described in Genetic Analysis 4.3. Dr. Sophila 
thinks the F progeny demonstrate that a single gene with 
incomplete dominance has produced a 1:2:1 ratio. Dr. Dopsis 
insists his proposal of recessive epistasis producing a 9:4:3 
ratio in the F, is correct. To test his proposal, Dr. Dopsis ex- 
amines the F, data under the assumptions of the single-gene 
incomplete dominance model using chi-square analysis. 
Calculate and interpret this chi-square value. Can Dr. Dopsis 
reject the single-gene incomplete dominance model on the 
basis of this analysis? Explain why or why not. 


34. Ina breed of domestic cattle, horns can appear on males 
and on females. Males and females can also be hornless. 
The following crosses are performed with parents from 
pure-breeding lines. 
Cross | Cross Il 


Parents: horned male Xx 
hornless female 


F,: males horned, females 
hornless 

F>: males are 3 horned, 3 
hornless 


females are horned, + 
hornless 


Parents: hornless male X 
horned female 


F,: males horned, females 
hornless 


F>: males are 3 horned, 
; hornless 


females are ; horned, 
3 hornless 


Explain the inheritance of this phenotype in cattle, and 
assign genotypes to all cattle in Cross I. 
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Thomas Hunt Morgan, Nobel laureate (1933), discovered sex-linked 
inheritance, identified genetic linkage, proposed crossing over between 
homologous chromosomes, and developed the concept of gene mapping 
by recombination analysis. 


n 1933, Thomas Hunt Morgan won the Nobel Prize for 
Physiology or Medicine—partly for his work establishing 
sex-linked inheritance and the chromosome theory of hered- 
ity (see Section 3.3) and partly for his role in identifying and 
explaining genetic linkage and recombination and their ap- 
plication to genetic linkage mapping, which we discuss in this 
chapter. Morgan, like all successful scientists, was assisted by 
dedicated colleagues who included many exceptional stu- 
dents and other scientists. Among them were Calvin Bridges, 
whose work we discussed in connection with the chromo- 
some theory of heredity, and Alfred Sturtevant, who as an 
undergraduate researcher in Morgan's laboratory became the 
first person to use genetic linkage data to assemble a genetic 


map. A number of less well-remembered researchers, 
including Morgan’s wife Lilian, were also important 
members of the research enterprise. 

The work of Morgan, his colleagues, and 
numerous others led to the validation of three 
foundational theories in genetics. First, the work 
validated the chromosome theory of heredity, and 
it expanded the theory by showing that each chro- 
mosome carries multiple genes in a specific order. 
Second, the research validated the concept of the 
gene as a physical entity that is an integral part of 
a chromosome, and it led to work that expanded 
understanding of gene structure and demonstrated 
that genes are composed of nucleotides between 
which recombination may occur. Third, the work 
validated evolutionary theory by confirming that 
closely related species have a similar number of 
chromosomes and a similar arrangement of genes 
on chromosomes. The work led to an expansion 
of evolutionary theory that showed that recombi- 
nation provides a mechanism by which variation 
in chromosome number and the arrangement 
of genes on chromosomes can accrue as species 
diverge from a common ancestor. 

The observations and analysis of genetic linkage, 
recombination, and genetic linkage mapping are the 
focus of this chapter, which also touches on the con- 
nection between gene mapping and the investiga- 
tion of chromosome evolution. 


5.1 Linked Genes Do Not Assort 
Independently 


Genes that are located on the same chromosome are called 
syntenic genes. When two syntenic genes are so close to 
one another that their alleles are unable to assort indepen- 
dently, the genes are said to be linked to one another. This 
genetic linkage produces a distinctive pattern of gamete 
genotypes that can be quantified and analyzed to map 
the locations of genes on chromosomes. The alleles of 
syntenic genes can be reshuffled by crossing over between 
homologous chromosomes to produce recombinant 
chromosomes. In studies of linked genes, chromosomes 
that do not undergo crossing over to reshuffle the alleles 
under study are identified as parental chromosomes, or 
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nonrecombinant chromosomes. The discovery of ge- 
netic linkage, made more than a century ago, opened the 
door to the development of genetic linkage mapping, 
which plots the positions of genes on chromosomes. Over 
the last century, new methods for identifying and mapping 
genes have been added to the analytical arsenal of genet- 
ics, but the importance of genetic linkage and its mapping 
applications remains undiminished. 

Mendelian genetic ratios such as 3:1 and 9:3:3:1 are 
the products of segregation and independent assortment of 
alleles of genes for which chance determines the probabili- 
ties of gamete genotypes and the results of gamete union. 
Even when these independently assorting genes are subject 
to epistatic interactions, the rules of probability describe 
the distribution of the contributing alleles and can be used 
to interpret the resulting ratios (see Section 4.3). 

Often, two genes assort independently because they 
are located on separate chromosomes, but syntenic genes 
can also assort independently, if they are far apart on a 
chromosome. In this situation, crossing over occurs fre- 
quently enough between the genes to randomize the com- 
binations of alleles produced during meiosis. Syntenic 
genes that are in close proximity to one another do not 
cross over frequently enough to randomize the combina- 
tions of alleles in gametes. As a result, the genes do not 
assort independently. Instead, the alleles on each of the 
original chromosomes (the parental chromosomes) con- 
tinue to reside on the same chromosome as it segregates 
from its homolog during cell division. 

To repeat, the connection that causes alleles of linked 
genes to segregate together during meiosis can be broken by 
crossing over. Recall that homologous chromosomes syn- 
apse and form the synaptonemal complex in prophase I (see 
Figure 3.11). The recombination nodules, consisting of pro- 
teins and enzymes, that form part of this complex can gener- 
ate crossing over by facilitating the breakage, exchange, and 
reunion of segments of homologous chromosomes. This re- 
combination of chromosome segments reshuffles the alleles 
carried at linked genes, resulting in haploid gametes that 
contain different combinations of alleles of syntenic genes 
than were present in the diploid cell that began meiosis. 

The following observations and conclusions about 
genetic linkage are essential to understanding the phe- 
nomenon. We discuss them in the following paragraphs 
and then expand on the same fundamental ideas through- 
out the remainder of the chapter. 


1. Linked genes are always syntenic, and they are always 
located near one another on a chromosome. When 
syntenic genes are so far apart on the chromosome 
that crossing over between them generates indepen- 
dent assortment of the alleles, the genes are not linked. 


2. Genetic linkage leads to the production of a sig- 
nificantly greater number of gametes containing 
chromosomes with parental combinations of al- 
leles than would be expected under assumptions 
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of independent assortment and to a significantly 
smaller number of gametes containing chromosomes 
with alleles that are different from the parental 
combinations. 


3. Crossing over is less likely to occur between linked 
genes that are close to one another than between 
genes that are farther apart on a chromosome. The 
frequency of crossing over is roughly proportionate 
to the distance between genes, a relationship that al- 
lows genes to be mapped. 


Indications of Genetic Linkage 


Genetic linkage can be recognized by comparing the ob- 
served frequencies of gamete genotypes, or progeny pheno- 
types, with the frequencies expected under the assumptions 
of independent assortment. If genes are linked, parental 
gametes—also known as nonrecombinant gametes—that 
contain parental combinations of the alleles will be pro- 
duced significantly more often than predicted by chance. 
The excess parental gametes will also result in progeny in 
which parental phenotypes for the genes occur significantly 
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more often than predicted by chance. Here “significantly” is 
used in the sense of statistical significance as determined by 
chi-square analysis (see Section 2.5). 

Figure 5.1 demonstrates the identification of genetic 
linkage by comparing the frequencies of gamete genotypes 
for two crosses, one illustrating independent assortment and 
the other genetic linkage. In Figure 5.1a, gene A and gene B 
are on different chromosomes, and alleles of the genes assort 
independently. The parental organisms are AABB and aabb, 
and their gametes AB and ab are the parental gametes. The 
Fı progeny are dihybrid (AaBb), and independent assort- 
ment predicts these dihybrids will produce four genetically 
different gametes in a ratio of 1:1:1:1. Notice that the fre- 
quency of parental gametes (AB and ab) is 50%, and that the 
frequency of nonparental gametes (Ab and aB) is also 50%. 

Figure 5.1b illustrates gamete-genotype production for 
syntenic genes D and E that are linked. The DDee par- 
ent produces parental gametes that are De, and the ddEE 
parent produces dE gametes. The dihybrid Fı progeny 
are DdEe, carrying alleles D and e on one chromosome 
and d and E on the homolog. This arrangement of al- 
leles can be written DeE, with the slash (“/”) separating 


(b) Genetic linkage 
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Independent assortment versus genetic linkage. (a) For this dihybrid, four genetically 


different gametes are expected at 25% each when the genes assort independently. (b) When genes 
are linked, parental gametes are much more frequent than expected by chance and are more frequent 


than nonparental gametes. 
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(b) Incomplete genetic linkage (crossover in 20% of gametes) 
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Figure 5.2 Complete versus incomplete genetic linkage. 
(a) Genes exhibiting complete genetic linkage do not recom- 
bine and all gametes are parental. (b) Linked genes with a 
recombination frequency of 20% produce 20% nonparental 
gametes and 80% parental gametes. (c) Linked genes with 

a recombination frequency of 40% produce 60% parental 
gametes and 40% nonparental gametes. 


the alleles carried on each member of the homologous 
chromosome pair. With genetic linkage, the rate of re- 
combination among the alleles is low, and parental allele 
combinations usually stay together during meiosis, lead- 
ing to the production of parental gametes (De and dE) at a 
combined frequency that is significantly greater than 50%. 
The low frequency of crossing over between closely linked 
genes results in the production of recombinant, or nonpa- 
rental, gametes (DE and de) at a combined frequency that is 
significantly less than 50%. 

Complete genetic linkage is observed when no recom- 
bination at all occurs between linked genes. Complete ge- 
netic linkage can be identified, for example, in cases where 
a dihybrid produces two equally frequent gametes contain- 
ing only parental allele combinations and no recombinant 
gametes (Figure 5.2a). The absence of recombination be- 
tween homologs usually has a specific biological basis. 
Certain organisms, including Drosophila males and other 
males in the insect order Diptera (of which Drosophila 
is a member), exhibit complete genetic linkage. There is 
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no recombination between homologous chromosomes in 
these male flies. The biological basis of the absence of re- 
combination in these organisms remains unknown. 

Incomplete genetic linkage is far more common for 
linked genes. The resulting recombination between the 
homologs produces a mixture of parental and nonpa- 
rental gametes. In the F; dihybrid shown in Figure 5.2b, 
recombination produces four genetically different gam- 
etes, of which two are parental and two are nonparental 
(recombinant). The two parental gametes each have ap- 
proximately the same frequency, and their total is signifi- 
cantly greater than 50% of all gametes. In this example, 
the frequency of each parental gamete (RT and rt) is 
40%, and the total frequency of parental gametes is 80%. 
Recombinant gametes, which have nonparental combi- 
nations of alleles, are approximately equal in frequency 
to one another and constitute significantly less than 50% 
of all gametes. In this case, a total of 20% of gametes are 
recombinant: 10% of the gametes are Rt and 10% are rT. 
Since the relative proportions of parental and recombi- 
nant gametes depend on the frequency of crossing over 
between linked genes, the proportions differ among pairs 
of linked genes. Note that the percentages of different 
gametes obtained for the cross in Figure 5.2c are different 
from those in Figure 5.2b, and also notice that the paren- 
tal alleles on chromosomes in Figure 5.2c are a dominant 
and a recessive allele—Mn/mN. Parental chromosomes 
do not necessarily always contain all dominant and all re- 
cessive alleles. Rather, parental chromosomes are defined 
by whatever combination of alleles are originally present 
on the homologs. 

The recombination frequency, expressed as the 
variable r, identifies the rate of recombination for a given 
pair of linked genes. The value of r is expressed as 


number of recombinants 


total number of progeny 


Recombination frequency varies between different pairs 
of syntenic genes, depending roughly on the distance 
separating the genes on the chromosome. Comparing 
Figure 5.2b and Figure 5.2c, for example, we see that 
recombination frequency is 20% (r = 0.20) in Figure 
5.2b and 40% (r = 0.40) in Figure 5.2c. The greater 
recombination frequency in Figure 5.2c compared to 
Figure 5.2b is most likely the consequence of a greater 
distance between genes N and M than between genes 
T and R. The correlation between recombination fre- 
quency and gene distance can be expressed in two 
equivalent ways: (1) crossing over occurs at a higher rate 
between genes that are separated by a greater distance, 
and at a lower rate for genes that are closer together; and 
(2) linked genes with higher recombination frequencies 
are more distant from one another than linked genes 
with lower recombination frequencies. There are some 
caveats to this generalization, however, as we discuss in 
later sections. 


The Discovery of Genetic Linkage 


William Bateson, an early champion of Mendelian genet- 
ics, and Reginald Punnett, after whom the Punnett square 
is named, reported a series of experiments on sweet peas 
in 1905, 1906, and 1908. Those experiments opened a 
new chapter in genetics by drawing attention to genetic 
linkage. Bateson and Punnett studied the traits of flower 
color and the shape of pollen grains in sweet peas, first as 
independent traits and then together in the same plants. 

When the traits were studied separately, the genes 
for flower color and pollen shape obeyed the rules of 
segregation—generating 3:1 phenotypic ratios among the 
Fp, for example. But Bateson and Punnett went on to 
study both traits in the same plants, intending to test 
the law of independent assortment. They crossed pure- 
breeding purple-flowered, long-pollen plants (PPLL) to 
pure-breeding red-flowered, round-pollen plants (ppil). 
As expected, the F, consisted exclusively of purple- 
flowered, long-pollen plants, and these plants were 
crossed to obtain the F. But then, instead of the 9:3:3:1 
ratio predicted by the independent assortment hypoth- 
esis, a far larger than expected portion of Fy progeny 
showed parental combinations of phenotypes, and many 
fewer showed nonparental combinations (Table 5.1). 

In the Fy, Bateson and Punnett observed that the 
two parental phenotypes—purple, long and red, round— 
were substantially in excess of expected frequencies, and 
that the two nonparental phenotypes—purple, round and 
red, long—were substantially less frequent than expected. 
This observation led Bateson and Punnett to suggest that 
the two combinations of alleles carried in the parents— 
PL and pl—remained together very frequently when they 
were passed through gametes to subsequent genera- 
tions by an unknown mechanism. Bateson and Punnett 
described these alleles as exhibiting “coupling.” They 
described the appearance of new, nonparental phenotypes 
in the F, as indicating “repulsion” of the parental alleles, 
to produce nonparental phenotypes in progeny. 

In 1911, Morgan performed the first of many 
crosses that confirmed and explained the observation 
of coupling and repulsion identified by Bateson and 


Table 5.1 Bateson and Punnett’s Observed and 


Expected Phenotypes in F2 Sweet Peas 


Phenotype Genotype Number of Progeny 
Observed Expected (9:3:3:1 ratio) 
Purple, long P-L- 4831 (6952)(9/16) = 3910.5 
Purple, round P-II 390 (6952)(3/16) = 1303.5 
Red, long ppL- 393 (6952)(3/16) = 1303.5 
Red, round ppll 1338 (6952)(1/16)= 434.5 
6952 6952.0 
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Figure 5.3 Morgan’s analysis of genetic 
linkage of X-linked genes for eye color (w) 
and wing form (m). The number of test-cross 
progeny with each phenotype are compared to 
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Punnett. Morgan had by this time identified several 
genes on the X chromosome of the fruit fly, includ- 
ing w (white eye) and m (miniature wing). Figure 5.3 
illustrates that Morgan crossed a female pure-breeding 
for white eyes and miniature wings (wm/wm) with 
hemizygous wild-type males displaying red eye and full 
wing (w*m"/Y). The F; progeny were dihybrid wild-type 
females (w'm*/wm) and white, miniature (wm/Y) hemi- 
zygous males. 

Morgan then produced an F, generation, predicting 
a 1:1:1:1 ratio based on the assumption of independent 


assortment of the genes. Instead, Morgan found sub- 
stantial deviation from expectations. As in the Bateson 
and Punnett experiment, Morgan observed that parental 
phenotypes predominated (791 + 750 = 1541, or 63.1%) 
and that fewer than the expected number of nonpa- 
rental phenotypes were produced. The recombination 
frequency for this experiment is r = 445 + 455/2441 = 
0.369, or 36.9%. Notice that the two parental phenotypes 
are observed in an approximate 1:1 ratio (791:750), as are 
the nonparental phenotypes (455:445), as expected from 
segregation. 
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Figure 5.4 Morgan’s crossing-over hypothesis. Each homolog initially contains identical 
sister chromatids. A single crossover produces two recombinant chromatids. Completion of 
meiosis produces two parental gametes and two recombinant gametes. 


Based on this result, Morgan proposed that parental 
phenotypes are produced when the gametes of the F; fe- 
male carry chromosomes with the same sets of alleles as 
in the parents, in this case wm" and wm. Eggs containing 
parental alleles unite with sperm carrying w and m on the 
X chromosome or carrying the Y chromosome, and paren- 
tal phenotypes (the same phenotypes as in the P generation 
flies) are produced. Conversely, nonparental phenotypes 
are the result of recombination between homologous X 
chromosomes during F; female meiosis (Figure 5.4). The 
production of recombinant chromosomes carrying either 
w'm or wm’ required the physical rearrangement (recom- 
bination) of homologous X chromosomes. The union of 
eggs containing recombinant X chromosomes with sperm 
produced F with nonparental phenotypes. Morgan con- 
firmed this explanation through the examination of many 
other pairs of linked genes on the fruit fly X chromosome. 


Detecting Autosomal Genetic Linkage 
through Test-Cross Analysis 


Turning his attention to autosomal genes and employ- 
ing 20/20 hindsight, Morgan realized that Bateson and 
Punnett had detected genetic linkage but were unable to 
explain it because, with respect to experimental design, 
they had performed the wrong cross! The F, progeny in the 
Bateson and Punnett experiment fell into four phenotypic 
classes, but three of those classes contained multiple gen- 
otypes, owing to the dominance relationships among the 
alleles (see Figure 2.11). Bateson and Punnett were unable 
to determine which alleles in the progeny derived from 
each F; parent because they had no way of ascertaining 
the high frequency of parental combinations of alleles and 
the low frequency of recombinants in F, gametes. 
Morgan realized that the linkage of autosomal genes 
in Drosophila could be fully interpreted through the use 
of two-point test-cross analysis in which a dihybrid F; 
fly is crossed to a pure-breeding mate with the recessive 


phenotypes. The “two points” in these analyses are the two 
genes being tested. In two-point test-cross analysis, the 
homozygous recessive fly contributes only recessive al- 
leles to test-cross progeny. In contrast, the dihybrid fly can 
contribute either a dominant allele of a gene, in which case 
the progeny display the dominant phenotype, or the reces- 
sive allele, thus producing the recessive form of the trait. 

In one experiment, Morgan used test-cross analysis to 
examine genetic linkage of autosomal genes affecting eye 
color and wing shape. Drosophila eye color is red if an auto- 
somal dominant allele pr* is present, whereas the recessive 
purple eye color is produced when the only allele present is 
pr. Full-sized wing is the product of an autosomal dominant 
allele vg", and its recessive counterpart, vestigial wing, is 
determined by the allele vg. Morgan crossed fruit flies that 
are pure-breeding for red eyes and full wing with pure- 
breeding purple-eyed, vestigial-winged flies (Figure 5.5a). 
The F} were uniformly red eyed and full winged (pr* vg*/pr 
vg). Morgan then test-crossed dihybrid F; females to purple- 
eyed, vestigial-winged males (pr vg/pr vg). In this cross, 
males contributed only recessive alleles (pr and vg), but fe- 
males could produce any one of four gamete genotypes. The 
alleles of the female gamete thus controlled the phenotype 
of test-cross progeny. If the female contributed a dominant 
allele to progeny, the phenotype for that trait was dominant; 
and conversely, if the donated female allele was recessive, 
the phenotype was recessive. Test-cross progeny pheno- 
types corresponded directly to the alleles contributed by F, 
females, thus making it possible to unambiguously identify 
the allelic content of chromosomes in female gametes. 

Under the assumption of independent assortment, 
dihybrid females should produce four equally frequent 
gametes, and test-cross progeny are expected to have four 
phenotypes distributed in a 1:1:1:1 ratio (see Figure 2.13). 
With genetic linkage however, parental combinations of 
alleles occur preferentially in gametes, producing test-cross 
progeny with a significant excess of parental phenotypes 
and a significant deficit of nonparental phenotypes. 
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Figure 5.5 Morgan’s test-cross analysis of genetic linkage between autosomal genes. (a) Dihybrid 
F, females (pr*vg'/pr vg) are test-crossed to males homozygous for recessive mutant purple eye color 
and vestigial wing (pr vg/pr vg), permitting identification of progeny as carrying either a parental ora 
recombinant chromosome. (b) Single crossover during female meiosis leads to parental and recom- 
binant gametes at frequencies specified by recombination or by chance, and gamete union produces 


test-cross progeny. 


Morgan’s test-cross progeny displayed the four 
expected phenotypes, but in numbers that deviated dra- 
matically from expected Mendelian proportions. Among 
test-cross progeny, 89.3% were parental, and just 10.7% 
were recombinant. The nonrecombinant progeny classes 
were found in approximately a 1:1 ratio (1339:1195), as 
were the recombinant classes (154:151); thus, the two pa- 
rental chromosomes were transmitted equally frequently, 
as were the two recombinant chromosomes. Figure 5.5b 
shows that among the 89.3% of parental female gametes, 
one-half, or 44.65%, are predicted to be of each parental 
type. Similarly, among the 10.7% of gametes that are 
recombinant, each recombinant type is predicted with a 
frequency of 5.35%. 

In the years immediately following Morgan’s explana- 
tion of genetic linkage, other biologists, working on plant 
species and animal species, used test-cross analysis to 


verify Morgan’s hypothesis. The collective results of these 
experimental observations can be summarized as follows: 


1. Genetic linkage is a physical relationship between 
genes that are located near one another on a 
chromosome. 


2. Recombination occurs between linked genes on ho- 
mologous chromosomes in significantly less than 50% 
of meiotic divisions. Significantly more than 50% of 
gametes contain parental combinations of alleles. 


3. The recombination frequency varies among linked 
genes and is roughly proportionate to the distance 
between genes on a chromosome. 


Genetic Analysis 5.1 takes you through the identification 
of parental and recombinant progeny and the determina- 
tion of recombination frequency. 


GENETIC ANALYSIS 


PROBLEM In tomato plants (Lycopersicon esculentum), red fruit color (T—) is dominant to tangerine color (tt), 


BREAK IT DOWN: Arecombina- 
tion frequency of 20% means that 80% 
of gametes are parental and 20% are 
recombinant. 


and smooth leaf (H—) is dominant to hairy leaf (hh). Both genes are located on chromosome 7, and they have an DOWN: Pure-breeding 


recombination frequency of 20%. A pure-breeding plant producing tangerine-colored 
fruit and smooth leaves is crossed to a pure-breeding red-fruited, hairy-leaved plant. 
The F4 are test-crossed to a pure-breeding tangerine-fruited, hairy plant. What are the ex- 


——$— A EA t-cross progeny? | BREAK IT DOWN: The F; are TtHh, 
pected genotypes, phenotypes, and phenotype proportions among test-cross progeny? | and they are test-crossed to tthh. 


tangerine, smooth is ttHH and pure- 
breeding red, hairy is TThh. 


Solution Steps 


Solution Strategies 


Evaluate 


1. Identify the topic of this problem and 
the nature of the required answer. 


2. Identify the critical information given 
in the problem. 


1. This problem concerns the prediction of inheritance in progeny of a test cross 


for linked genes. The answer requires that the expected frequency of each pos- 
sible category of test-cross progeny be predicted from the information given 
about recombination frequency between the genes. 


. Dominant and recessive phenotypes, the phenotypes of two pure-breeding pa- 


rental plants, and the recombination frequency between genes controlling two 
traits are given in the problem. 


Deduce 


3. Identify the alleles in the gametes of 
the parental plants. 


4. Identify the genotype and phenotype 
of F, plants, and determine the paren- 
tal arrangements of alleles. 


. Each parent is pure-breeding for a dominant and a recessive trait: 


Tangerine, smooth = ttHH 
Red, hairy = TThh 
Parental gametes = all tH from one parent and all Th from the other 


. F; are dihybrid (tH/Th) and have the two dominant phenotypes (red and 


smooth). The pure-breeding parents have contributed chromosomes carrying tH 
and Th. 


Solve 


5. Determine the number and 
frequency of F; gametes, given the 
recombination frequency of 20%. 


TIP: With genetic linkage, parental combina- 
tions of alleles are significantly greater than 50% 
of the gametes. 


6. Determine the expected outcome of 
the test cross. A 


= There are two equally likely | 


parental gametes and two equally 
likely recombinant gametes. 


For more practice, see Problems 5, 6, and 12. 


. Four genetically different gametes are possible: tH, Th, TH, and th. Among these 


gametes, 20% will be recombinants and 80% parentals (100% — 20% = 80%). 
Chance predicts that the two parental gametes (tH and Th) are produced at 
equal frequency. Likewise, the two recombinant gametes (TH and th) are pro- 
duced at equal frequency. The expected gamete frequencies are 


Parentals: tH = (0.80)(1/2) = 0.40 


( 
Th = (0.80)(1/2) = 0.40 
Recombinants: TH = (0.20)(1/2) = 0.10 
th = (0.20)(1/2) = 0.10 


6. Test-cross progeny are expected to be 40% each tangerine, smooth and red, 


hairy; and 20% each red, smooth and tangerine, hairy. 


th (1.0) Test-cross progeny 
0.40 tH | tHæth 0.40 | langerine, 40% 
smooth 
Parental 
040 Th | Thth 040 | Red: 40% 
es i hairy ° 
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smooth 
Recombinant — À 
0.10 th | th/th 0.10 | Tengerine, 10% 
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5.2 Genetic Linkage Mapping Is 
Based on Recombination Frequency 
between Genes 


An important outcome of Morgan’s studies of linked 
genes in Drosophila was his recognition that signifi- 
cantly more parental than recombinant progeny oc- 
curred and that the proportion of recombinants varied 
considerably from one pair of linked genes to another. 
Morgan summarized this idea in 1911, stating, “The 
proportions that result are not so much the expression 
of a numerical system as of the relative location of the 
factors (genes) in the chromosome.” Morgan was say- 
ing that independent assortment was not determining 
the relative proportions of gametes produced by an 
organism. Instead, the close proximity of linked genes 
on a chromosome overrode the expected influence of 
independent assortment. The linkage of genes prefer- 
entially retained parental combinations of alleles and 
led to a much higher proportion of parental gametes 
and a much lower proportion of nonparental gametes 
than were expected by chance. Morgan’s intuition was 
correct, and his insight profoundly changed views of 
hereditary transmission and of the location and orga- 
nization of genes on chromosomes. In this section, we 
examine methods for constructing genetic maps from 
recombination data for two linked genes, and in the next 
section, we'll move on to consider the mapping of three 
linked genes. 


The First Genetic Linkage Map 


In the context of early 20th-century biology, Morgan’s 
idea that genes were on chromosomes was not novel. 
For example, Sutton, Boveri, and others had noted the 
parallel between hereditary transmission and chromo- 
some division. But biologists at the time did not know 
either the structure of genes or how they were encoded 
on chromosomes (see Section 3.4). Morgan was the 
first to demonstrate that genes are on chromosomes, 
however, and his proposal that the recombination fre- 
quency for a linked pair of genes might correspond to 
the distance between those genes on a chromosome was 
a novel idea. 

Morgan viewed genes as inhabiting fixed locations on 
chromosomes. Like cities along a road, the order of genes 
could be determined, the locations of genes on a chromo- 
some could be specified, and the distances between genes 
could be quantified. If his hypothesis were correct, he 
reasoned, then recombination frequencies could be used 
to produce a genetic linkage map depicting gene order 
along a chromosome and to calculate a quantitative index 
of linear distances between genes. As Morgan discussed 
his ideas about recombination frequency and gene dis- 
tances, Alfred Sturtevant, then an undergraduate student 


Table 5.2 Sturtevant’s Recombination Data for Five 


X-Linked Genes in Drosophila 


Recombination 
Frequency 


214/21,736 = 0.010 
1464/4551 = 0.322 
471/1584= 0.297 
17/573 = 0.030 
2062/6116 = 0.337 
406/898 = 0.452 
109/405 =0.269 


Gene Pairs 

Yellow (y) and white (w) 

Yellow (y) and vermilion (v) 
Vermilion (v) and white (w) 
Vermilion (v) and miniature (m) 
Miniature (m) and white (w) 


White (w) and rudimentary (r) 


Rudimentary (r) and vermilion (v) 


working in Morgan’s laboratory, had an epiphany. In a 
1965 book, Sturtevant recalled the moment: 


In the latter part of 1911, in a conversation with Mor- 
gan, I suddenly realized that the variations in strength 
of linkage, already attributed by Morgan to differences 
in the spatial separation of genes, offered the possi- 
bility of determining sequences in the linear dimen- 
sion of a chromosome. I went home and spent most 
of the night (to the neglect of my other undergraduate 
homework) in producing the first chromosome map. 


Sturtevant used the results of numerous two-point 
test-cross experiments on five X-linked genes in Drosophila 
to create the first genetic linkage map. He based his map- 
building approach on the idea that smaller recombination 
frequencies indicated genes residing closer to each other 
on the chromosome, and larger recombination frequencies 
indicated greater distances between genes on the chromo- 
some. To construct his genetic map, Sturtevant used the 
data in Table 5.2. His finished recombination map is il- 
lustrated in Figure 5.6. In the century since Sturtevant first 
compiled his map, millions of progeny fruit flies have been 
analyzed for X-chromosome recombination. The accumu- 
lated data have led to slight modifications in Sturtevant’s 


yw vm r Centromere 
Sturtevant’s map || a ts 
0.0 1.0 30.7 33.7 57.6 
yw vm r Centromere 
0.0 1.5 33.0 36.1 54.5 67.7 


Figure 5.6 The first linkage map. The original Drosophila X 
chromosome map of five genes assembled by Alfred Sturtevant 
(top) and the contemporary X-chromosome map for Drosophila 
based on current data (bottom). Sturtevant’s map is based in 
part on the recombination frequencies given in Table 5.2. 
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estimated recombination frequencies but have not neces- 
sitated any changes in gene order. Sturtevant assembled his 
map using logic of the kind demonstrated in the following 
four steps: 


1. Ofthe genes tested, the pair with the smallest recom- 
bination frequency, and therefore in closest proxim- 
ity, are the gene producing white eye (w) and the 
gene carrying yellow (y) body. With their recombina- 
tion frequency of just 1%, they must be at almost the 
same spot on the chromosome. 


2. Vermilion (v) is more distant from yellow (32.2% 
recombination) than it is from white (29.7% recombi- 
nation), suggesting the order y—w-v. 


3. Miniature (mn) is close to vermilion (3% recombina- 
tion) but is more distant from white (33.7% recom- 
bination) than is vermilion. Adding miniature to the 
gene map produces the order y-w—v—m. 


4, Rudimentary (r) is very distant from white (45.2% 
recombination) and also fairly distant from vermilion 
(26.9% recombination). This information places rudi- 
mentary on the opposite side of the map from white, 
yielding the final map y-w—v—m-r. 


Map Units 


As we examine our map of the Drosophila X chromo- 
some (Figure 5.6), the correlation between recombina- 
tion frequency and physical distance on chromosomes 
becomes easier to understand. The recombination 
frequencies between genes on a chromosome can even 
be converted into units of physical distance, using the 
concept of a map unit (m.u.). A map unit is also known 
as a centiMorgan (cM) in honor of Thomas Hunt 
Morgan’s contribution to recombination mapping. It is 
common (at least in introductory genetics courses) to 
use the equivalency: 


1% recombination = 1 m.u. or 1 cM of distance 
between linked genes 


This is an approximation, and not a very good one for 
certain regions of particular genomes, as we discuss in 
a later section. Despite its shortcomings, however, it is 
accurate enough for our instructional purposes in this 
textbook. 


Chi-Square Analysis of Genetic Linkage Data 


In our discussion of genetic linkage data, we have noted 
that when genes are linked, significantly more paren- 
tal phenotypes than recombinant phenotypes are found 
among progeny. But how can we tell whether the ob- 
served data constitute evidence of genetic linkage rather 
than a simple case of chance variation from expected 
values? The question is settled by the use of chi-square 
analysis of observed and expected values to identify sta- 
tistically significant differences. (Section 2.5 describes 


the chi-square test and demonstrates the calculation and 
interpretation of chi-square p, or probability, values.) 

As an example, let’s revisit the data obtained by Morgan 
on the w gene affecting eye color and the m gene controlling 
wing form in Drosophila, presented in Figure 5.3. The cross 
of F, dihybrid females (wm/w*m") to white-eyed, minia- 
ture-winged males (wm/Y) produces an F, generation that 
would have been expected to display a 1:1:1:1 phenotypic 
ratio. This ratio is based on the assumption that indepen- 
dent assortment determines the alleles contained in female 
gametes. Using the observed and expected values, we calcu- 
late the chi-square value as follows: 


(791 — 610.25)? (750 — 610.25)? 


2 = 


X 


610.25 610.25 
(445 — 610.25)? (455 — 610.25)? 
= 169.79 
610.25 610.25 


There are 3 degrees of freedom (df = 3) in this problem, 
and the corresponding p value is p < 0.005 (see Table 2.4). 
This observed result indicates a significant deviation from 
expected results, suggesting that chance is not responsible 
for the observed distribution. Combined with the obser- 
vation that the two phenotypes that exceed the expected 
number are parental, these data are consistent with the 
presence of genetic linkage between the genes. 


5.3 Three-Point Test-Cross Analysis 
Maps Genes 


Two-point test-cross analysis is an effective way to cal- 
culate the recombination frequency between two linked 
genes and to infer the distance between the genes, but it is 
not the most effective way to build genetic maps containing 
multiple genes. By expanding the idea of test-cross analysis 
to three-point test-cross analysis, however, geneticists 
can efficiently map three linked genes simultaneously. 


Finding the Relative Order of Genes 
by Three-Point Mapping 


Let’s consider a three-point test cross between a trihybrid 
organism (a'ab"bc*c) and an organism that is homozy- 
gous recessive for the three traits (aabbcc). The configura- 
tion of alleles in the trihybrid does not have to be known 
at the start, since the three-point analysis will deduce the 
configuration of alleles on parental chromosomes as part 
of the process. 

Incomplete genetic linkage of three genes in a trihy- 
brid produces eight genetically different gamete geno- 
types. This is the same number of genetically different 
gametes expected if we assume independent assortment; 
but, unlike the expectations for independent assort- 
ment, the gamete frequencies are unequal if the genes 
are linked. Among the eight gamete genotypes are two 


parental genotypes that are significantly more frequent 
than expected by chance as well as six recombinant geno- 
types, each detected less often than expected. Assuming, 
for the purposes of this example, that the three linked 
genes are in the order a—b-—c, we can identify parental and 
recombinant gametes by the relative frequencies of the 
corresponding test-cross progeny classes. 


b {Ni gametes, 
recombination or 
not, are the same. 


(a) Test cross 1 


a 


A 
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Imagine that Test cross 1 mates a trihybrid organism with 
the genotype a*b*c*/abc to one that is abc/abc (Figure 5.7a). 
Test cross 2 shows an alternative arrangement of alleles on 
parental chromosomes, mating the trihybrid a*bc*/ab‘c to an 
organism with genotype abc/abc (Figure 5.7b). In Test cross 
1, parental gametes (a‘b*c* and abc) are produced when 
no crossovers occur between the genes, and the resulting 


(b) Test cross 2 
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Figure 5.7 Three-point test crosses for different allele configurations in a trinybrid parent crossed 
to a triple recessive parent. (a) In Test cross 1, parental chromosomes carry the three wild-type and 
the three recessive alleles. Gametes with these alleles are parental and produce progeny with parental 
phenotypes. Single- and double-recombinant gametes lead to test-cross progeny displaying recombi- 
nation. Test-cross progeny with eight genotypes (@ to@) are produced. (b) In Test cross 2, a different 
configuration of alleles on parental chromosomes produces parental and recombinant progeny that are 
different from those in Test cross 1. Eight test-cross progeny genotypes (@ to @) are produced. 
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progeny have either the three wild-type or three recessive 
phenotypes. A single crossover occurring between genes a 
and b produces two recombinant gametes, a*be and abc", 
and progeny with the corresponding patterns of phenotypes. 
Likewise, single crossover between genes b and c also pro- 
duces two recombinant gametes, a°b‘c and abc’, and cor- 
responding progeny. A double-crossover event that causes 
crossing over both between a and b and between b and c will 
produce a pair of double-crossover gametes, a*bc* and ab‘c, 
and progeny with the corresponding mixtures of wild-type 
and recessive traits. 

Test cross 2 produces the same eight gamete geno- 
types obtained from Test cross 1, but the alleles start out 
arranged differently on the parental chromosomes. Thus, 
the parental and recombinant gamete genotypes in this test 
cross are different from those in the first test cross. In this 
test cross, the parental gametes are a'bc* and ab‘c. The 
single-crossover gametes are a‘b‘c and abc’ for crossover 
between genes a and b. Single crossover between genes b 
and c produces gametes a*bc and ab‘c*. A double-cross- 
over causing recombination between each pair of genes 
produces double-crossover gametes a*b‘c* and abc. 

As expected when genes are linked, each of the six 
recombinant gametes is observed at a frequency that 
is significantly less than predicted by chance. Single- 
crossover gametes form at frequencies determined by the 
relative distances between gene pairs. Within each single- 
crossover class, the two gametes will be equally frequent. 
Double-crossover gametes will be the least frequent class 
because both crossover events must occur. As within each 
single-crossover class, the two kinds of double-crossover 
gametes are produced at equal frequency. 


Constructing a Three-Point Recombination Map 


To illustrate the use of three-point test-cross data for 
constructing a genetic map, we will now analyze the data 
from a 1935 study by Rollins Emerson of genetic linkage 
in maize (Zea mays). Emerson tested three genes: the 
gene producing the phenotypes green seedling (V—) and 
yellow seedling (vv), the gene producing rough leaf (G/-) 
and glossy leaf (gl gl), and the gene for normal fertility 
(Va-) and variable fertility (va va). 

Maize was an important genetic experimental organism 
in the first half of the 20th century because of the large num- 
ber of variable genetic traits it possesses, the ease with which 
large numbers of plants can be grown in a single season, the 
ability of researchers to control matings in a manner similar 
to Mendel’s, and the production of large numbers of seeds 
from each cross. On an ear of corn, each kernel is a seed pro- 
duced by the union of gametes; thus, a single ear can carry 
hundreds of progeny seeds, each the product of independent 
fertilization, and a small number of plants can yield tens of 
thousands of progeny seeds for analysis. 

Emerson crossed pure-breeding wild-type plants having 
the dominant phenotypes green seedling, rough leaves, and 
normal fertility (V Gl Va/V GI Va) to pure-breeding plants 
having the recessive phenotypes yellow seedling, glossy 
leaves, and variable fertility (v gl va/v gl va). The cross pro- 
duced F; trihybrid plants with the dominant phenotypes and 
the genotype V GI Va/v gl va that carries three dominant 
alleles on one chromosome and three recessive alleles on the 
homolog. The F; were then test-crossed to pure-breeding 
yellow, glossy, variable plants (v gl va/v gl va). The test-cross 
progeny are shown in Table 5.3. To create a genetic map that 


Table 5.3 Emerson’s Three-Point Test-Cross Analysis 
Parental cross: VG/Va/VGI Va x vglva/vglva 

Green, rough, normal yellow, glossy, variable 
Test cross: VGIVa/v gl va x vglva/vglva 


Green, rough, normal 


yellow, glossy, variable 


Test-cross progeny: 


Number 
Phenotype Observed 
1. Yellow, rough, normal 60 
2. Yellow, glossy, normal = 48 
3. Yellow, rough, variable 4 
4. Yellow, glossy, variable 270 
5. Green, rough, normal 235 
6. Green, glossy, normal 7 
7. Green, rough, variable 40 
8. Green, glossy, variable 62 


Number Genotype 
Expected (2 gamete/c’ gamete) 
90.75 v GI Va/v gl va 
90.75 _vglVa/vglva 
90.75 vGlva/v gl va 
90.75 vglva/v gl va 
90.75 VGIVa/v gl va 
90.75 Vgl Va/v gl va 
90.75 VGlva/v gl va 
90.75 Vglva/vglva 
726 


places the three genes in correct relative order and to calcu- 
late recombination frequencies between gene pairs, we ask 
and answer five questions about these data: 


1. Are the data consistent with the proposal of genetic 
linkage? 
What alleles are on each parental chromosome? 

3. What is the gene order on the chromosome? 


What are the recombination frequencies of the gene 
pairs? 

5. Is the frequency of double crossovers consistent with 
independence of the single crossovers? 


Question 1: Are the Data Consistent with the Proposal 
of Genetic Linkage? Under the assumptions of inde- 
pendent assortment, trihybrid plants produce eight 
genetically different gametes at a frequency of 0.125, or 1/8, 
each, and test-cross progeny are expected in eight equally 
frequent phenotypic classes. In this experiment, with 726 
test-cross progeny, the expected number of progeny in each 
class would be (726)(0.125) = 90.75. Chi-square analysis 
comparing observed and expected numbers of progeny in 
each class yields a chi-square value in excess of 800. There 
are (8 — 1) = 7 degrees of freedom, and the corresponding 
p value is p < 0.005. From this result, we conclude that 
the observed distribution of test-cross progeny deviates 
significantly from expectation, and we reject the independent 
assortment hypothesis as the explanation of these data. 

If the deviation in this experiment is due to genetic 
linkage, then we would expect the numbers of prog- 
eny having parental phenotypes to be excessively high. 
Comparing the observed and expected values in each test- 
cross Class shows that only two phenotype classes exceed 
expected numbers: the green, rough, normal class and the 
yellow, glossy, variable class. These are the two parental 
phenotypes. From this analysis, we conclude that the data 
are consistent with genetic linkage: the distribution of test- 
cross progeny deviates significantly from what would be 
expected from independent assortment, and only parental 
phenotypes are seen more often than expected by chance. 


Question 2: What Alleles Are on Each Parental Chro- 
mosome? We can answer this question in two ways. The 
simpler approach is to use the phenotype information 
available about pure-breeding parental plants in the cross. 
The parent plants were pure-breeding dominant and 
pure-breeding recessive. From this information, we know 
that trihybrid F; plants have the dominant alleles on one 
chromosome and the recessive alleles on the homologous 
chromosome. The genetic structure of the test cross is 
V Gl Va/v gl va X v gl va/v gl va, and so the alleles on 
parental chromosomes must be V GI Va and v gl va. Test- 
cross progeny Classes 4 and 5 in Table 5.3 are parentals. 
The second approach is necessary when we do not 
know the phenotypes of parents or when the alleles on each 
chromosome are not known. In this approach, test-cross 
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data are used to determine parental chromosomes. The 
data in Table 5.3 indicate that the test-cross progeny in 
Class 5—green, rough, normal (V GI Va/v gl va)—and in 
Class 4—yellow, glossy, variable (v gl va/v gl va)—exceed 
expected frequency and are therefore the parental classes. 
Both approaches tell us the same story: The parental chro- 
mosomes carry alleles V Gi Va and v gl va. 


Question 3: What Is the Gene Order on the Chromosome? 
With parental chromosomes identified, the six remaining 
classes must be recombinants: four are single-crossover 
classes, and two are double crossovers. Double-crossover 
progeny will be the least frequent of all classes, because 
both crossover events must occur simultaneously to 
produce double recombinants, or double crossovers. 
From progeny numbers, we may presume that the 
smallest classes, Class 3—yellow, rough, variable—and 
Class 6—green, glossy, normal—are the probable double 
recombinants. We can use these predictions to test 
possible gene orders on parental chromosomes. 

For these three genes there are only three possible 
gene orders: (1) va—v-gl, (2) v—va-gl, or (3) va-gl-v. 
There are no data to assist us in determining the left-to- 
right orientation of the chromosome, so the difference 
between these gene orders is defined entirely by which 
gene is in the middle—v, va, or gl—and which two genes 
flank the middle gene. Each gene order could be written in 
the opposite direction, since each is a relative order of the 
three genes. For example, va—v—gi and gl-v—va are equiva- 
lent gene orders because each has v as the middle gene. 

There are two ways to determine the gene order. 
One procedure is to list each gene order possible for the 
parental chromosomes, draw the corresponding double 
crossover chromosomes, and then determine whether the 
double crossover gametes produced by this activity match 
the predicted double crossover progeny. If a match is not 
seen, the gene order is incorrect, but if a match is found, 
the correct gene order has been identified. 


1. Possible gene order va—v-—gl 


Predicted double- 


Parental chromosomes crossover gametes 


Va V Gl Va v GI 
a cio <eha 
Ot ae ee 

va v gl va V gl 


Result: Double-crossover gametes obtained from this 
gene order are not those predicted from the data. 
Conclusion: The proposed gene order is incorrect; v is 
not the middle gene. 


2. Possible gene order v-va-gl 


Predicted double- 


Parental chromosomes crossover gametes 


V Va Gl V va Gl 
ee aa) 
v va gl v Va gl 
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Result: Double-crossover gametes obtained from this 
gene order are not those predicted from the data. 
Conclusion: The proposed gene order is incorrect; va is 
not the middle gene. 

3. Possible gene order v-gl-va 


Predicted double- 


Parental chromosomes crossover gametes 


V Gl Va V gl Va 
TAD cm 
Ehe tes) 

v gl va v GI va 


Result: Double-crossover gametes obtained from this 
gene order match those predicted from the data. 


Conclusion: This proposed gene order is correct: gl is the 
middle gene, and the gene order may be written as either 
v-gl-va or va-gl-v. This analysis confirms that test-cross 
progeny Classes 3 and 6 are double-crossover progeny. 


The second method for determining gene order is a 
shortcut approach that requires some familiarity with re- 
combination. Looking back at Figure 5.7, note that if we 
compare parental and double-crossover chromosomes, the 
alleles of the outside genes appear to remain the same while 
the middle allele appears to switch. In other words, when 
we compare one parental chromosome with one double- 
recombinant chromosome, two alleles match and one does 
not. The odd one out is the allele in the middle. If a trihy- 
brid parent has alleles arranged as a*b*c*/abc, then double 
crossover produces gametes that are a*bc*/ab‘c. Parental 
alleles a* and c* match one double recombinant, and alleles 
band b” are switched. Similarly, the second parental gamete 
has alleles a and c that match the other double recombinant. 
Alleles of the middle gene, b and b*, have switched in the 
double recombinant compared to the parental chromosome. 

Remember, we have already identified the parental 
and double-crossover phenotypic groups by their num- 
bers. We now look at the double crossovers to see which 
two alleles match parental phenotypes and to see which 
allele changes and is therefore the middle gene. In our 
data set, double-recombinant chromosomes are V gl Va 
and v Gl va. In this case, alleles of the gl gene have 
switched, indicating that gl is the middle gene. Based on 
this approach, the gene orders and alleles on parental 
chromosomes are V G/ Va and v gl va. 


Question 4: What Are the Recombination Frequencies 
of the Gene Pairs? Taking the gene pairs one at a time, we 
calculate the recombination frequencies by counting the 
total number of crossovers that occur between the genes 
of that pair. Every crossover event between the two genes 
is counted, whether the event occurs by itself (a single 
crossover) or simultaneously with another event (a double 
crossover). In this case, there are 11 double recombinants, 
each with one crossover between v and gl and one 
crossover between gl and va, for a total of 22 crossover 
events between v and va. Single-crossover progeny are 


predicted on the basis of parental chromosomes having the 
gene order v—gi—va. Between v and gl, a single crossover 
produces the following. 


Predicted single- 


Parental chromosomes crossover gametes 


V Gl Va V gl va 
cae cn 
("om mE ee 

v gl va v GI Va 


Test-cross progeny carrying these recombinant chromo- 
somes have the phenotypes yellow, rough, normal (Class 
1) and green, glossy, variable (Class 8). The recombination 
frequency is calculated as the sum of all single and double 
recombinants for this gene pair divided by the total num- 
ber of progeny: 60 + 62 + 4 + 7/726 = 0.183, or 18.3%. 
Therefore, the distance between these genes is approxi- 
mately 18.3 cM. 

Single crossover between gl and va produces the 
following. 


Predicted single- 


Parental chromosomes crossover gametes 


V GI Va V Gl va 
oe 
v gl va v gl Va 


Test-cross progeny carrying these chromosomes are 
found in Class 2 (yellow, glossy, normal) and Class 7 
(green, rough, variable). Recombination frequency 
r = 48 + 40 + 4 + 7/726 = 0.136, or 13.6%. The inter- 
genic distance is approximately 13.6 cM. 

Recombination between the flanking markers, 
va and v, is calculated by counting all crossovers be- 
tween those genes. Recombination between v and va is 
r = 60 + 62 + 48 + 40 + 22/726 = 0.320, or 32%. 


Question 5: Is the Frequency of Double Crossovers 
Consistent with Independence of the Single 
Crossovers? Asking and answering questions 1 through 
4 identifies the alleles on each parental chromosome, and 
determines the gene order and recombination frequencies 
between genes. But in most tests of genetic linkage, the 
number of double crossovers is less than the number 
expected, and question 5 allows this common observation 
to be quantified. The reduction in the observed number of 
double crossovers is caused by an effect called interference 
(J), which limits the number of crossovers that can occur 
in a short length of chromosome. Interference, which we 
discuss further in Section 5.4, is quantified by comparing 
the number or frequency of observed double-crossover 
events to the number or frequency expected assuming 
each crossover event occurs independently. In Emerson’s 
data set, there are 11 double crossovers among test-cross 
progeny, or (11/726) = 0.015 (1.5%). If each crossover 
were independent, expected double-crossover frequency 
would be the product of the two single-crossover 


frequencies, (0.183) (0.136) = 0.025 (2.5%). The expected 
number of double-crossover progeny would therefore be 
(0.025)(726) = 18.2. Observed double recombinants are 
divided by expected double recombinants, producing a 
value known as the coefficient of coincidence (c). Either 
the numbers or the frequencies of observed and expected 
double recombinants can be used to determine c: 


observed double recombinants 


expected double recombinants 
11/18.2 = 0.60( using numbers ) 


or 


0.015/0.025 = 0.60(using frequencies ) 


Interference is defined as J = 1 — c, so for this data 
set Z = 1 — 0.60 = 0.40. Interference identifies the pro- 
portion of double recombinants that are expected but 
are not produced in the experiment (the difference be- 
tween expectation and actuality). In this case, the number 
of double recombinants was 40% lower than expected. 
Interference is a very common observation in most re- 
gions of most genomes. On occasion, however, certain 
regions of some genomes generate more double recom- 
binants than expected. In these cases J < 0, a situation 
called negative interference. Interference will be J = 0 
when the observed and expected double crossovers are 
equal. The molecular basis of interference is not well un- 
derstood, although current research shows that there is a 
mechanical limit that restricts the number of recombina- 
tion events in a particular region of a chromosome. 
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Determining Gamete Frequencies 
from Genetic Maps 


The same principle used to construct genetic linkage 
maps—the relation between relative distances and recom- 
bination frequency—can be used to make predictions in 
the opposite direction, that is, to determine the expected 
frequencies of recombinant gametes on the basis of com- 
pleted genetic linkage maps. 

In Figure 5.8a, two linked genes have a recombi- 
nation frequency of 10%. For the dihybrid organism 
AB/ab, two gametes (AB and ab) are parental, and two 
(Ab and aB) are recombinant. Recombinant gametes equal 
10% of total gametes, and each recombinant is expected 
to occur with the same frequency. The probability is cal- 
culated as (1/2)(0.010) = 0.05 for each recombinant 
gamete. In this calculation, 1/2 is the probability of each 
recombinant chromosome appearing in a gamete, and 
0.010 is the probability of recombination between the 
genes. Conversely, parental gametes AB and ab are formed 
at a frequency equal to 100% minus 10%, or 90% of total 
gametes. Parental gametes are also expected at equal fre- 
quency—in this case (1/2)(0.90), or 45% each. 

Gamete frequencies for three linked genes are pre- 
dicted in a similar manner. In Figure 5.8b, genes a and 
b are shown along with a third gene, c, located 20 cM 
from gene b. To predict gamete frequencies, we make 
the assumption that interference is J = 0 to simplify 
the calculation of the number of recombinants. For the 
trihybrid ABC/abc, parental gametes are produced when 


(a) (b) 
r=0.10 5e r=0.20 
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A B 5 A B C 5 
GBD  (3)(0.90) = 0.45 aw D (3) (0.9) (0.8) = 0.36 
a b — Parental a b c — Parental 
GRD (;)(0.90)= 0.45 € = (5) (0.9) (0.8) = 0.36 
A b = A b fa = 
GOD = (3)(0.10) = 0.05 aaa (5) (0.1) (0.8) = 0.04 
a B — Recombinant a B C — Single 
GRD = (3)(0.10) = 0.05 d (3) (0.1) (0.8) =0.04 recombinant (a-b) 
1.00 7 A B c = 
qmm (35) (0.9) (0.2) = 0.09 
a b C — Single 
GD (;) (0.9) (0.2) = 0.09 recombinant (b-c) 
A b C > 
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Figure 5.8 Gamete genotype frequencies calculated from genetic linkage data 


1.00 


. (a) Gamete 


frequencies predicted from a map of two linked genes. (b) Gametes predicted from a map of three linked 
genes assuming interference is zero (/ = 0). 
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crossover does not occur in either gene interval. The 
probability of no crossovers between genes a and b is 90% 
(0.9), and between b and c it is 80% (0.8). Considering 
both gene pairs, the proportion of nonrecombinant 
gametes is (0.9)(0.8) = 0.72. There are two equally 
frequent parental gametes, each with an expected fre- 
quency of (0.72)(0.5) = 0.36. Recombination frequency 
is 10% (0.1) between a and b. Two single recombinants 
between genes a and b have an expected frequency 
of (0.1)(0.8)(0.5) = 0.04 each. Similarly, single recom- 
binants between genes b and c have expected frequencies 
of (0.9)(0.2)(0.5) = 0.09 each. Each of the double- 
recombinant gametes, AbC and aBc, are expected with 
a frequency of (0.1)(0.2)(0.5) = 0.01. The sum of fre- 
quencies of the eight predicted gamete genotypes is 1.0, 
indicating that all gametes have been counted. 


5.4 Recombination Results from 
Crossing Over 


Morgan’s hypothesis of recombination by crossing over 
between homologous chromosomes has stood the test of 
time and is now universally accepted. When he proposed 
it, Morgan’s model fit nicely with a 1909 observation by 
F. A. Janssens, who captured a view of meiotic chromo- 
somes under the microscope and suggested that the chi- 
asmata seen between homologous chromosomes might 
be points of recombination. Clear proof of the hypothesis 
of gene recombination by chromosome exchange was not 
obtained, however, until 20 years after Morgan proposed 
it. In 1931, research published by Harriet Creighton and 
Barbara McClintock on crossing over in corn (Zea mays), 
and a nearly simultaneous report by Curt Stern on cross- 
ing over in Drosophila, provided direct evidence that gene 
recombination and physical exchange between homolo- 
gous chromosomes went hand-in-hand. 


Cytological Evidence of Recombination 


Creighton and McClintock studied recombination be- 
tween homologous copies of chromosome 9 in corn that 
were distinguished by two genetic markers—the genes 
controlling kernel color (c1) and starch type (wx) in Zea 
mays—and by two cytological markers—structural differ- 
ences in the homologous copies of chromosome 9 that 
were observed under the microscope. One copy of chro- 
mosome 9 had the normal microscopic appearance and 
carried alleles cl and Wx. The homologous copy of chro- 
mosome 9 carried alleles CJ and wx and was cytologically 
altered in two ways. On the end nearer C1, the chromo- 
some had a darkly staining region called a “knob”; on 
the other end, near wx, the chromosome carried a frag- 
ment of chromosome 8 that had been transferred by a 
chromosome-rearrangement event called translocation 
(we explore this event in Section 13.4). Creighton and 
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Figure 5.9 Cytological proof from Zea mays that recombi- 
nation results from crossing over. Progeny displaying recom- 
binant phenotypes are also seen to carry physically rearranged 
chromosomes. 


McClintock obtained cytological evidence that recombi- 
nation involved the physical exchange between homolo- 
gous chromosomes by detecting genetic recombinants 
(chromosomes carrying the alleles C7 and Wx or car- 
rying the alleles cl and wx) that were also cytologically 
rearranged chromosomes (Figure 5.9). 

Just a few weeks after Creighton and McClintock 
reported their evidence of a link between chromosome 
rearrangement and genetic recombination, Curt Stern 
reported similar findings in Drosophila. The combined 
genetic and chromosomal recombination analyses in corn 
and fruit fly provided convincing evidence that genetic 
recombination between homologous chromosomes is ac- 
companied by physical exchange between the chromo- 
somes in plants and in animals. 


Limits of Recombination along Chromosomes 


Creighton, McClintock, and Stern showed convincingly 
that crossover is accompanied by chromosome breakage 
and rejoining. Morgan and Sturtevant’s work, supported 
by data from several of their contemporaries, established 
that the relative distance between two linked genes on 
a chromosome influences the frequency of recombina- 
tion between them. Two important questions about the 
likelihood and frequency of crossing over derive from 
these observations. First, why does distance between genes 


influence recombination frequency? And second, is there 
an upper limit to the frequency of recombinant gametes 
for a pair of linked genes? 

The answer to the first question is that in early 
prophase I, points of crossing over are established at re- 
combination nodules that occur along the synaptonemal 
complex (see Section 3.2). Two genes that are close to one 
another are less likely to have a recombination nodule be- 
tween them and are less likely to recombine than are a pair 
of genes separated by a greater distance on a chromosome. 

Recombination occurs after DNA replication has 
been completed, when each member of a homologous 
chromosome pair is composed of two sister chroma- 
tids. This is the four-strand stage. Single crossovers in- 
volve one chromatid from each homolog. There are four 
equivalent ways this process can occur, and all four events 
produce the same outcome—two parental gametes and 
two recombinant gametes (Figure 5.10a). Crossovers that 
occur between nonsister chromatids but not between the 
loci tested will not leave genetic evidence of recombina- 
tion (Figure 5.10b). 

There are three patterns of double crossover between 
two genes. The outcomes of each pattern are unique with 
respect to the number of recombinant gametes produced. 
Two-strand double crossover produces no recombinants, 
because two recombination events between a pair of genes 
do not produce genetic evidence of recombination in the 
form of a recombinant gamete (Figure 5.11a). A three- 
strand double crossover, involving three of the sister 
chromatids, can happen in two ways that each produce 


(a) Possible single crossovers 


Oo (2) (37 
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the same genetic outcome—two parental and two recom- 
binant chromosomes in gametes (Figure 5.11b). When a 
four-strand double crossover occurs, all four chromo- 
somes in gametes are recombinant (Figure 5.11c). 

In answer to the second question we posed earlier, 
recombination between a pair of linked genes is limited to 
50% of the gametes. As we have seen, of the four gametes 
produced by single crossover, two are recombinant gametes 
(have the nonparental genotype) and so result in a total of 
50% recombinant. Likewise, summing the outcomes of the 
example two-, three-, and four-strand double crossovers 
shown in Figure 5.11 gives a total of 8/16 (50%) recom- 
binant gametes. This establishes an upper limit of 50% 
as the frequency of both parental and nonparental geno- 
types in gametes. Most instances of genetic linkage produce 
substantially more than 50% parental chromosomes and 
substantially less than 50% nonparental. The smallest pro- 
portions of recombinant chromosomes are associated with 
the most tightly linked genes (i.e., the genes that are closest 
together), and the recombinant proportions increase as the 
distance between genes becomes greater. Recombination 
frequencies between linked genes can increase up to 50% as 
the distance between genes gets larger, and the correspond- 
ing frequency of parental chromosomes decreases to 50%. 
Thus, frequencies of recombination between linked genes 
are always less than 50%. Once there is sufficient distance 
between syntenic genes, however, crossover randomizes the 
combinations of alleles on chromosomes, and the pattern 
becomes that of independent assortment. In other words, 
syntenic genes that are far apart assort independently. 
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Figure 5.10 Results of single crossover. (a) Single crossovers occur between homologous chro- 
mosomes in multiple ways. Each meiosis produces two parental chromosomes and two recombinant 
chromosomes, thus 50% of gametes can carry recombinant chromosomes. (b) Single crossover taking 
place outside the chromosome region being tested does not reveal recombinant chromosomes. 
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(a) Two-strand double crossover 
(three equivalent ways, one position held constant) 
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No recombinant gametes are produced 
by any two-strand double crossover. 


(b) Three-strand double crossover (one position held constant) 
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(c) Four-strand double crossover (one position held constant) 
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Figure 5.11 Results of double crossover. Double crossovers 
between two genes involving two, three, or all four chromatids 
result collectively in a maximum of 50% recombinant gametes. 


Lozenge gene 


Genetic Analysis 5.2 presents the results of test crosses 
involving three linked genes and takes you through the 
determination of recombination frequencies between the 
genes. 


Recombination within Genes 


Our discussion thus far describes how the linear order 
of genes along chromosomes can be determined based 
on crossover between genes. Does crossover ever occur 
within genes? The answer is yes. 

Crossing over within genes, called intragenic re- 
combination, is an infrequent event that is detected 
through the examination of large numbers of progeny, 
usually for evidence of recombination between homo- 
logs carrying different mutant alleles of the same gene. 
Since the site of mutation within the gene is different for 
each allele, intragenic recombination produces one wild- 
type recombinant chromosome and one double-mutant 
chromosome. 

Melvin Green and Kathleen Green were the first to 
report intragenic recombination in a 1949 study of the 
Drosophila gene for an X-linked recessive mutant eye 
phenotype called “lozenge,” which disrupts the number 
and pattern of facets on the eye of the fly. Several different 
mutations of the lozenge gene each produce a distinc- 
tive lozenge phenotype. The Greens (a husband and wife 
team), following up on work begun a few years earlier 
by Clarence Oliver, used lozenge-eyed females, each car- 
rying two different lozenge-producing alleles, /z?> and 
lz, on the homologous copies of their X chromosomes 
(Figure 5.12). The lozenge mutations are located at differ- 
ent positions within the lozenge gene; each mutant allele 
has mutant DNA sequence at the site of mutation but 
has wild-type DNA sequence in the rest of the gene. Rare 
intragenic recombination leads to one double-mutant X 
chromosome carrying both lozenge mutations in a single 
gene, and a wild-type X chromosome with a lozenge gene 
that contains neither mutation. The double-mutant chro- 
mosomes produce a phenotype that is distinct from either 
of the mutations alone. The Greens detected fewer than 
20 double-mutant X chromosomes and the wild-type X 
chromosome in more than 16,000 progeny of the lozenge- 
eyed females, but the result was sufficient to verify intra- 
genic recombination. 
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Figure 5.12 
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Intragenic recombination in the lozenge eye gene of Drosophila. Progeny resulting 


from intragenic recombination can be detected by a distinct lozenge phenotype produced by the 
double-mutant chromosome or by having wild-type eyes. The genes ct and v are used to verify intra- 


genic recombination. 


GENETIC ANALYSIS 


PROBLEM Dr. O. Sophila, a famous geneticist, is evaluating genetic linkage among 
three X-linked genes in Drosophila. At these genes, red eye (v*) is dominant to ver- 
milion eye (v); full wing (r*) is dominant to rudimentary wing (r); and gray body color 
(y*) is dominant to yellow (y). Dr. Sophila has the results of three test crosses. Help 


tiie. i 2 é i ae BREAK IT DOWN: Test-cross progeny allow each 
Dr. Sophila identify which pairs of genes are linked, and calculate the recombination allele to be assigned to a chromosome (p. 154). 
frequency between linked genes. 


progeny with parental phenotypes will be significantly greater 


BREAK IT DOWN: If genes are linked, the frequency of 
than expected by chance (p. 155). 


Test Cross I: Test Cross II: Test Cross Ill: 
Q yv/+ (gray body, red eye) x Q vr/+ (red eye, full wing) x Ọ yr/++ (gray body, full wing) x 
Co yv/Y (yellow body, vermilion eye) CO vr/Y (vermilion eye, rudimentary wing) œŒ yr/Y (yellow body, rudimentary wing) 
Progeny Number Progeny Number Progeny Number 
Yellow, vermilion 338 Vermilion, rudimentary 396 Yellow, rudimentary 246 
Gray, red 332 Red, full 389 Gray, full 252 
Yellow, red 160 Vermilion, full 110 Yellow, full 259 
Gray, vermilion 170 Red, rudimentary 105 Gray, rudimentary 243 
1000 1000 1000 
Solution Strategies Solution Steps 
Evaluate 
1. Identify the topic of this problem 1. This problem involves the assessment of three test crosses involving X-linked 
and the nature of the required genes. The answer requires determination of genetic linkage versus indepen- 
answer. dent assortment for each gene pair and, for linked genes, the calculation of 
recombination frequency. 
2. Identify the critical information 2. The genotypes and phenotypes of test-cross flies are given, and the number of 
given in the problem. test-cross progeny in each phenotypic category is also given. 
Deduce 
3. Determine the test-cross results 3. In each cross, the dihybrid female would be expected to produce four geneti- 
expected under the assumption of cally different gametes at frequencies of 25% each and the progeny would be 
independent assortment. expected to display four phenotypes in a 1:1:1:1 ratio (250 each). In Test cross |, for 


example, the following results would be expected, and expected results would be 
similar for the other test crosses as well. 


Phenotype Female Male Number 
Yellow, vermilion yv/yv yvlY 250 
Gray, red yv/y*vt yvily 250 
+ + 
TIP: Chi-square analysis could be used to test Yellow, red yv/yv y 1Y 250 
the statistical significance of deviations between ili H H 
observed and expected outcomes. Gray, vermilion y v/y y y wY 200 
Solve 
4. Examine each cross and determine if 4. Test cross | and Test cross II show clear deviation from the predicted ratio, 
there is evidence of genetic linkage with parental categories substantially greater than 250 each and nonparental 
between the gene pairs. categories substantially less than 250 each. The progeny of Test cross Ill are 


distributed in numbers consistent with the independent assortment prediction. 
These statements are based on chi-squared analysis that is not shown. 


5. Calculate the recombination 5. In Test cross l, the recombinant progeny are yellow, red and gray, vermilion. 
frequencies between linked pairs r = 160+ 170/1000 = 0.330, indicating that these genes are linked and are 
of genes. separated by 33 m.u. 


In Test cross Il, the recombinant phenotypes are vermilion, full and red, 
rudimentary. The recombination frequency is r= 110 + 105/1000 = 0.215, or 
approximately 21.5 m.u. 


For more practice, see Problems 2, 4, and 28. Visit the Study Area to access study tools. MasteringGenetics™ 
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Biological Factors Affecting Accuracy of 
Genetic Maps 


Inherent in the use of recombination frequency as a measure 
of approximate distance between genes along a chromo- 
some is the assumption that genetic distance and physi- 
cal distance are proportional throughout the genome and 
that recombination frequencies for given genes are constant 
among all members of a species. However, studies in numer- 
ous species indicate that age, environment, sex, and other, 
as yet undetermined, factors may affect recombination fre- 
quency and may affect the relationship between the genetic 
recombination map and the physical map of a chromosome. 
For example, advancing age of female fruit flies decreases the 
frequency of crossover between gene pairs; more crossovers 
between a specific pair of genes are seen in younger females 
than in older. Female Drosophila crossover frequency is 
also affected by temperature. Growth of a fruit-fly colony 
at 22°C is optimal for recombination, and increases or de- 
creases of temperature from optimum can change crossover 
frequency. Restricting dietary levels of calcium and magne- 
sium, important cofactors for enzymes that interact with 
DNA, also decreases crossover frequency in fruit flies. 

The most dramatic impact on recombination frequency 
in animals, however, is connected to sex. Recombination 
frequency differs for males and females of most animal 
species and follows a general pattern in which the hetero- 
gametic sex, the sex with two different sex chromosomes 
(most often males), has a lower rate of recombination than 
the homogametic sex, the sex with two fully homologous sex 
chromosomes (most often females). The higher recombina- 
tion frequency in the homogametic sex is a genome-wide 
phenomenon and is not limited to the sex chromosomes. 
Fruit flies display an extreme version of this phenomenon— 
female fruit flies undergo homologous recombination while 
male fruit flies undergo no recombination at all! 

These observations are seen across the taxonomic 
spectrum, including in humans. Human females experi- 
ence more crossing over than human males, resulting in 
a larger recombination map in females. A detailed recom- 
bination and genome sequencing analysis of human chro- 
mosome 19 exemplifies this phenomenon. Chromosome 
19 is composed of about 65 megabases (Mb), or 65 million 
base pairs, in both male and female genomes (Figure 5.13). 
However, the length of the chromosome as determined 
by adding the estimated recombination distances along 
the entire length of the chromosome is a larger number 
of map units in females than in males. Also notice that re- 
combination frequencies are greater in regions at the ends 
of the chromosome in males but are greater in females in 
central chromosome regions. For the human genome as a 
whole, the female genetic map contains about 4400 cM, 
and the male map about 2700 cM. Geneticists studying 
the human genome usually produce a “sex averaged” hu- 
man genetic map that is slightly larger than 3500 cM. 

Among different species, the number of nucleotide 
base pairs per map unit varies. For example, the human 


CentiMorgans (cM) Map 
per Region 


Physical Genetic 
(Mb) Female Male 


A 


Region Female Male 


p133 14.9 43.1 we os ne 
p132 20.6 12.7 e e 
p131 148 38 a. 

p12 60 00 I| —— ` Z 
cen D ae Ss 

qi2 12.0 0.0 ee oe 


q13.1 204 3.4 | 


q13.2 10.7 23 
q13.3 124 15.5 
q13.4 16.2 33.7 


niL- 


©1999 Bios Scientific Publishers 65 Mb 128 cM 114 cM 


Figure 5.13 Physical distance versus recombination distance 
on human male and female chromosome 19. In most sexually 
reproducing organisms, the heterogametic sex has fewer recom- 
bination events and a shorter recombination map than does the 
homogametic sex. Data adapted from J. L. Weber et al. (1993). 


genome consists of a little less than 3 billion base pairs 
of DNA and the sex-averaged genome contains about 
830,000 bp/cM. In contrast, the Arabidopsis genome con- 
tains about 200,000 bp/cM; thus, recombination is about 
four times as frequent in Arabidopsis as it is in humans. 


Recombination Is Dominated by Hotspots 


Estimates of average numbers of base pairs per centiMor- 
gan, of the average recombination frequency for a species, 
and of distances in a sex-averaged recombination map 
such as the one described for humans are just that: aver- 
aged estimates. In contrast, genome-based information on 
organisms has led to the creation of fine-scale genetic maps 
of species that identify the distribution of recombination 
across the genome with much greater precision. Detailed 
assessment of recombination in human, mouse, and yeast 
genomes reveals a highly variable pattern of recombination 
within each genome that has led to the identification of 
recombination hotspots and recombination coldspots 
although in most cases, genetic recombination maps reveal 
proportionality between recombination frequencies and 
the physical maps of chromosomes. 

Genetic recombination maps are generated by analysis 
of recombination information and recombination frequency 
data. Physical maps of chromosomes, on the other hand, are 
based on genomic sequence data that identify specific genes 
within DNA sequence. The proportionality between genetic 
recombination maps and physical maps of a chromosome 
makes it possible to generate maps that locate the position 
and approximate distance between genes along a chromo- 
some. This proportionality exists because almost all regions 
of DNA are about equally likely to initiate recombination. 


Nevertheless, as noted above, many genomes do contain 
hotspots and coldspots of recombination—segments of 
chromosomes that undergo substantially more or substan- 
tially less recombination than the average for a species. 
Studies in yeast have examined this phenomenon in 
detail, and one study of yeast chromosomes has identi- 
fied hotspots and coldspots side by side. In Figure 5.14, the 
coldspot of recombination between spo7 and cdc15 results 
in mapping data that appear to place the genes closer to 
one another than they are in the physical map. In contrast, 
the hotspot between cdc15 and FLO1 makes them appear 
to be farther apart on the genetic recombination map than 
on the physical map of the chromosome. The other genes 
in this chromosome region have generally good propor- 
tionality between recombination and physical distances. 
The reason for the existence of hotspots and coldspots 
of recombination may have to do with the ability of DNA 
regions near specific genes to initiate the molecular events 
associated with the first steps of crossing over. In the case 
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Figure 5.14 Comparison of the physical map and recombina- 
tion map of yeast chromosome 1. A hotspot of recombination is 
detected between cdc15 and FLO1. A coldspot of recombination 
occurs between spo7 and cdc15. 
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of the coldspot between spo7 and cdc15 in yeast, the chro- 
mosome centromere is between the genes, which may be 
an additional factor contributing to the relatively low re- 
combination between those genes. We discuss more about 
the molecular process of recombination in Section 12.7. 


Correction of Genetic Map Distances 


Many factors affect crossing over and recombination in eu- 
karyotic genomes. Different genetic recombination maps 
for the two sexes of a species, age- and temperature-depen- 
dent variation in recombination in Drosophila females, and 
hotspots and coldspots of recombination scattered within 
the genome are examples of the influence of various factors 
on recombination. Given these diverse and sometimes spe- 
cies-specific effects, it is reasonable to ask whether recom- 
bination frequencies and map distances calculated on the 
basis of observed recombination between gene pairs are in 
fact fully accurate representations of the actual numbers 
of recombination events. The answer is no. Experimental 
evidence indicates that the map distances calculated be- 
tween two randomly selected genes usually underestimate 
the physical distance between the genes, largely because 
of undetected crossovers between them. The farther apart 
two syntenic genes are, the greater the inaccuracy, because 
double crossovers between a pair of genes are not detected 
as recombinant for flanking markers. 

A single crossover between genes A and B in a dihy- 
brid (AB/ab) produces two parental gametes (AB and ab) 
and two recombinant gametes (Ab and aB). As illustrated 
in Figure 5.11, however, a double crossover between the 
same genes produces crossover gametes that are not re- 
combinant for flanking markers and are indistinguishable 
from parentals. These crossover-nonrecombinant gam- 
etes are not counted when recombination frequency be- 
tween genes is calculated, because they are not observed. 
Larger distances between genes provide greater opportu- 
nity for double crossover and thus greater likelihood of 
crossover-nonrecombinant gametes. 

In theory, the relationship between recombination 
frequency and map distance is linear, but this is not the 
case in reality. Line @in Figure 5.15 depicts a linear rela- 
tionship between recombination frequency and the dis- 
tance in map units (cM). In contrast, line @ illustrates that 
relationship as actually measured in organisms. The lines 
diverge at about 8 cM, indicating that the relationship be- 
tween recombination frequency and map distance is linear 
only for linked genes that are separated by less than 8 cM, 
and that observed recombination frequencies usually un- 
derestimate the physical distance between genes. 

The central problem in correlating recombination 
frequency with the number of recombination events 
is the difficulty of identifying the number of meioses 
that produce each possible number of crossovers—zero, 
one, two, three, four, and so on. In an attempt to cor- 
rectly model different recombination classes and to ac- 
curately assess the correlation between recombination 
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Conclusion: Recombination frequency measured 
in organisms underestimates the actual distance 
between genes. 


Figure 5.15 The relationship between recombination fre- 
quency and physical distance between genes. Line @ traces 
a linear relationship between recombination frequency and the 
physical distance separating linked genes. Line @ traces the 
observed correspondence between recombination frequency 
and physical distance. 


frequency and crossover, J. B. S. Haldane developed a 
mapping function in 1919 that correlates map distance 
and recombination frequency between gene pairs. The 
Haldane mapping function has limitations, and several 
researchers proposed modifications of it to account for 
specific conditions affecting recombination in different 
species. 

One consistent concern raised about Haldane’s map- 
ping function is that it may overestimate the actual recom- 
bination frequency when interference occurs. Damodar 
Kosambi developed a modified mapping function to correct 
map distance in species with interference, and it has be- 
come one of the most widely applied improvements. 

Mapping functions are a quantitative solution to the 
issue of variability of recombination frequencies across 
the genome and between species. Meanwhile, the advent 
of genomic sequence analysis, and the ability to precisely 
compare recombination maps and physical maps, will 
continue to generate insight into recombination. Genetic 
maps are continually subject to refinement, and while 
the most accurate maps are constructed by summing 
many small intervals between genes, the precision in gene 
mapping keeps evolving more than 100 years after Alfred 
Sturtevant deduced the first genetic map. 


5.5 Linked Human Genes Are Mapped 
Using Lod Score Analysis 


Until relatively recently, the human genetic map was 
rather sparse. Humans cannot be studied through con- 
trolled matings and in any case produce much smaller 


numbers of offspring than do organisms like Drosophila 
and Zea mays. Consequently, gene-mapping methods 
developed and used successfully to map genes in model 
organisms are difficult to apply to human gene mapping. 
Historically, X-linked genes, by virtue of their unique 
patterns of transmission, were the first and easiest hu- 
man genes to map, whereas progress in mapping human 
autosomal genes was hampered by a scarcity of known 
polymorphic genetic markers, such as blood group anti- 
gens and blood proteins. 

Human genome mapping changed significantly in the 
mid-1980s, facilitated both by the emergence of molecular 
genetic methods to identify polymorphic DNA markers 
and by advances in gene-mapping software. Different 
types of polymorphic DNA markers, including restriction 
fragment length polymorphisms (RFLPs) and single nucle- 
otide polymorphisms (SNPs) (described in Section 10.2), 
ultimately made thousands of new human genetic mark- 
ers available for study in linkage analysis. Combined with 
sophisticated statistical techniques and modern computer 
power, the use of polymorphic DNA markers has given 
geneticists the ability to effectively map human genes by 
genetic linkage analysis. 

The availability of large numbers of DNA markers on 
each chromosome led first to the identification of linkage 
groups, clusters of syntenic genes that are linked to one 
another, and then to assignment of chromosomal locations 
to linkage groups. The discovery of genetic linkage between 
a genetic marker with a known chromosome location and 
any member of a linkage group assigns the linkage group to 
a chromosome location near the genetic marker. Different 
linkage groups on the same chromosome can then be or- 
ganized into maps of chromosome segments and whole 
chromosomes. 


Allelic Phase 


Efforts to map human genes often focus on finding the 
chromosomal locations of disease-causing genes. This 
is acommon first step toward the eventual cloning and 
sequencing of a gene that may be the cause of heredi- 
tary disease. A strategy known as functional cloning, or 
reverse genetics (see Section 16.2), can be used to map 
a gene whose function is not known. Once the loca- 
tion of the gene is identified, the gene can be cloned 
and sequenced, and the sequence can be examined for 
clues to the normal function of the gene and to the 
mechanisms by which gene mutation produces inher- 
ited abnormalities. 

To map genes, parental and recombinant chromo- 
somes must be identified, and one of the first obstacles 
researchers encounter in the effort to map human genes is 
the difficulty of determining allelic phase, a term referring 
to which alleles of linked genes are on each parental chro- 
mosome. Knowing allelic phase improves the statistical 
power of genetic linkage estimates. Figure 5.16 illustrates 
how allelic phase is identified in a family, and it points to 
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(a) (b) Figure 5.16 Allelic phase 
Family A Family B analysis in human families 
| if] 2 1 2 A and B. 
PaP; P,Ps 
Il 1 —C)2 Il 1 —C)2 
P,P, P;P, P,P, PsP; 
1 3 4 5 6 7 8 3 4 5 6 7 8 
'O iok òg" ò m Ò 
P,P; PaP, P,P, PrP; PaPa PoP; PaPa P:P; P,P; PaP, PPa P,P PaPa PoPs PaPa PaP 


Allelic phase is known in family A by tracing the 
transmission of the disease allele (D) and the P, 

genetic marker allele from l-2 to Il-1 and to Ill-1, 

lll-3 and Ill-4; Ill-6 is a probable recombinant. 


Allelic phase is not known in family B because the 
disease allele carried by Il-1 could be on either the 
chromosome carrying genetic marker allele P, or 
the chromosome carrying P». 


the importance of key individuals in determining allelic 
phase. The two pedigrees in the figure are identical in 
structure and in the distribution of an autosomal dominant 
hereditary disease indicated by shaded symbols. Notice, 
however, that individuals I-1 and I-2 are alive and are geno- 
typed for the genetic marker in Family A but not in Family 
B. The alleles of the gene determining the disease pheno- 
type are D and d. In addition to allelic information for the 
disease locus, the pedigrees show allelic information for a 
closely linked polymorphic DNA marker that has six alleles 
identified as P; to Po. 

Allelic phase is known to be P; D in Family A be- 
cause the affected woman in generation I (I-2) trans- 
mits marker allele P; along with the dominant disease 
allele (D) to her son, Il-1. The unaffected man in 
generation I (I-1) is homozygous for the recessive wild- 
type allele (dd) at the disease locus and heterozygous 
for DNA marker alleles P) and Ps. Allelic phase in II-1 
is P; D/P, d; the chromosome on the left of the solidus 
(/) is maternal, the chromosome on the right paternal. 
Considering that his mate (II-2) is P; d/P4 d, we can 
identify the transmission of parental and recombinant 
gametes from II-1 to his children in generation III. 
Children II-1, III-3, and III-4 inherited a paternal 
chromosome carrying PD to produce their disease and 
either the P3 or P4 allele along with d on their mater- 
nal chromosome. On the other hand, III-2, III-5, III-7, 
and III-8 inherited alleles P) and d on their paternal 
chromosome and either P3 or P4 along with d on their 
maternal chromosome. Child III-6 has apparently in- 
herited a recombinant chromosome carrying alleles 
P, and D from her father along with P3 and d on the 
maternal chromosome. 

The pedigree for Family B does not allow identifica- 
tion of allelic phase. In this family, there is no marker 
information for generation I, and thus allelic phase for 
Il-1 is unknown. He could either be P; D/P» d or P; d/ 
P D. For the purposes of genetic linkage analysis, each 
possible phase must be treated as equally likely. With al- 
lelic phase in H-1 unknown, we cannot be certain which 
of his children have inherited parental chromosomes and 


which carry recombinants. If II-1 is P; D/P» d, his chil- 
dren III-1 to III-5, and III-7 and III-8 are parental, and 
IlI-6 is recombinant. Alternatively, if he is P; d/P2 D, then 
III-1 to IL-5 and III-7 and III-8 are recombinant and III-6 
is parental. 


Lod Score Analysis 


Although it is not possible to unambiguously identify 
and count recombinants in pedigrees like Family B, a 
statistical method developed by Newton Morton in 1955, 
and refined and expanded since then, allows geneticists 
to calculate the overall probability of genetic linkage. 
Morton’s method determines whether genetic linkage 
exists between genes for which allelic phase is unknown 
by comparing the likelihood of obtaining the genotypes 
and phenotypes observed in a pedigree if two genes are 
linked versus the likelihood of getting the same pedigree 
outcomes if the genes assort independently. The ratio of 
these two likelihoods gives the “odds” of genetic linkage, 
and the logarithm of the odds ratio generates the lod 
score, a statistical value representing the probability of 
genetic linkage between the genes. 

The numerator of the odds ratio that yields the lod 
score is the likelihood that the distribution of phenotypes 
and genotypes in the pedigree is produced by genetic link- 
age between the genes. The denominator is the likelihood 
of the same pedigree outcomes assuming independent as- 
sortment between the genes (i.e., no genetic linkage). Lod 
score analysis evaluates each pedigree and determines the 
likelihood of genetic linkage for many different recombi- 
nation frequencies, each expressed as a variable called the 
© value (“theta value”). Using input data on each family 
member that identifies presence or absence of the disease 
and the genotype at a potentially linked marker gene, 
software programs calculate the likelihoods of genetic 
linkage versus no linkage between the genes and compute 
lod scores for each 0 value specified by the investigator. 
The @ values are any recombination frequency between 
6 = 0 (complete genetic linkage) and 0 = 0.50 (indepen- 
dent assortment). The programs determine lod scores, 
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and because they are log values, the lod scores for a given 
@ value in different families can be added together. After 
analyzing all available family data, the lod scores for 
each 0 value are summed and the highest lod score value 
obtained in a study is designated Zmax- The Zmax corre- 
sponds to the 0 value that is the most likely recombina- 
tion frequency between the genes tested. 

For each 0 value tested, the lod score will be positive if 
the likelihood of genetic linkage is greater than the likeli- 
hood of independent assortment, because in that case, the 
numerator value (likelihood assuming genetic linkage) is 
greater than the denominator value (likelihood assuming 
independent assortment). Conversely, if the pedigree is 
more likely to be produced by independent assortment than 
by genetic linkage, the independent assortment likelihood 
will be larger than the genetic linkage likelihood, and the lod 
score will be negative. 

Lod scores are calculated using the assumption that if 
two genes have a recombination frequency equal to 0, the 
probability that a particular gamete is recombinant is also 
equal to 0, and the probability that a gamete is nonrecom- 
binant is 1 — 0. Table 5.4 shows calculated lod score values 
for the two families shown in Figure 5.16. Notice that the 
lod scores are higher for Family A than for Family B. This 
is because with allelic phase known in Family A, the like- 
lihood estimate for genetic linkage between the disease 
gene and the marker gene is more accurate and leads to a 
higher probability of genetic linkage in this case. For each 
child in generation II, the probability that the gamete 
from the mother is parental is 1 — 0, and the probability 
that a recombinant gamete is transmitted from mother 
to child is 0. Since allelic phase is known for Family A, 
only the known phase is tested. In contrast, Family B 
does not have a known allelic phase; thus, each possible 
phase is assumed to be equally likely. In the Family B lod 
score computation, each phase is tested and is part of the 
numerator. Because a known allelic phase produces more 
genetic linkage information, the lod scores for Family A 
are greater than the lod scores for Family B. In the context 
of lod score analysis, Family A is identified as the more 
informative of the two pedigrees. 

A lod score is a statistic that can argue in favor of genetic 
linkage, if the probability of genetic linkage is sufficiently 


Table 5.4 Lod Score Values for the Families 

in Figure 5.16 
Family A (Phase Known) 
8 value 0 0.1 0.2 0.3 0.4 0.5 
Lod score —o 1.09 1.03 080 046 0.0 
Family B (Phase Unknown) 
6 value 0 0.1 0.2 0.3 0.4 0.5 
Lod score = 0.79 0.73 050 0.19 0.0 


greater than the probability of independent assortment, 
or it can argue against genetic linkage, if the probability of 
independent assortment is sufficiently greater than the link- 
age probability. Lod scores can be interpreted for individual 
families, or they can be added together for as many families 
as are analyzed. In either case, lod score significance is inter- 
preted by the following parameters: 


1. A lod score of 3.0 or greater is considered significant 
evidence in favor of genetic linkage. Such a score 
indicates significant odds of genetic linkage at each 
8 value at which it occurs. The 0 values identified as 
significant indicate the most likely number of cen- 
tiMorgans between linked genes. 


2. Lod score values of less than — 2.0 represent 
significant evidence against genetic linkage. Any 
lod score values for single or multiple families less 
than —2.0 reject genetic linkage at each 0 value 
with that result. 


3. Lod score values between 3.0 and —2.0 are 
inconclusive, neither affirming nor rejecting genetic 
linkage between the genes examined. Inconclusive 
results can be revised as additional data are 
collected. 


The three lod score curves shown in Figure 5.17 il- 
lustrate that lod score results may produce different 
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Figure 5.17 Sample lod score curves. Lod score values 
(vertical axis) are plotted against recombination fractions 
(8 values, horizontal axis) for three hypothetical lod score analyses. 
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Experimental Insight 5.1 


Mapping a Gene for Breast and Ovarian Cancer Susceptibility 


Most cases of cancer develop through the acquisition of 
multiple mutations in somatic cells, meaning that there is 
no inherited mutation that increases the likelihood of can- 
cer development. In some families, however, the frequent 
occurrence of a particular kind of cancer in a pattern consis- 
tent with single-gene inheritance can suggest the hereditary 
transmission of a mutant allele that increases the suscepti- 
bility of individuals to the cancer. The identity, indeed the 
very existence of these genes, is not known until they are 
conclusively shown to contribute to cancer development. 
One research strategy to identify cancer-susceptibility genes 
seeks genetic linkage of susceptibility genes to genetic mark- 
ers that have a known chromosome location. 

In the late 1970s, Mary Claire King and several collabora- 
tors devised a strategy in a search for a gene whose muta- 
tion could increase susceptibility to breast and ovarian 
cancer in families. King and her colleagues sought to maxi- 
mize the chance of finding such a cancer-susceptibility 
gene by carefully selecting families in which multiple cases 
of breast and ovarian cancers appeared at young ages, 
and in which occasional cases of bilateral cancer occurred 
(affecting both breasts or both ovaries in a single patient) 
in patterns consistent with an autosomal dominant inheri- 
tance of disease susceptibility. 

King initially looked for genetic linkage between inher- 
ited cancer susceptibility and biochemical markers such as 


polymorphic blood proteins and enzymes. None of the doz- 
ens of biochemical markers screened produced significant 
evidence of genetic linkage to a breast and ovarian cancer 
susceptibility gene. In the early 1990s, however, King and her 
colleagues turned to the use of DNA genetic markers. Then, 
in 1994, they identified genetic linkage between a group of 
tightly clustered DNA markers on human chromosome 17 
and a gene named Breast Cancer 1 (BRCA1). Lod score analysis 
of chromosome 17, as summarized in the following table, 
revealed that the candidate gene has a Zmax value of 21.68 
at 8=0.13. 

Five genetic markers that are part of a multipoint linkage 
analysis are shown. BRCA7 is most likely close to the middle of 
this linkage group, near the DNA marker gene D175588. 

Subsequent studies have identified and cloned the BRCA7 
gene and determined that it participates with a second gene 
called BRCA2 in DNA mutation repair. A large number of mu- 
tations of BRCA7 have been identified, and some of them dra- 
matically increase the likelihood that a woman will develop 
breast or ovarian cancer. Other mutations of BRCA7 do not 
appear to significantly increase breast or ovarian cancer risk. 
A good deal of work remains to be done to clarify the role of 
this gene in breast and ovarian cancer development, but the 
research strategy designed by King demonstrates the power 
of genetic linkage analysis for locating genes of interest. (We 
discuss more about BRCA7 and BRCA2 in Chapter 12). 


Lod Score Data for Linkage of BRCA7 to Chromosome 17q in Humans 


Genetic Marker 


0.001 0.01 0.05 
D17S250 —11.98 —8.96 —1.20 
D17S579 —1.43 1.62 8.55 
D17S588 8.23 11.39 18.35 
NME1 —1.41 0.75 6.01 
D17574 -39.15  —31.73 —-13.34 


Source: Data from J. Hall et al. (1994). 


patterns depending on the level of information available 
for the pedigree and on the actual relationship between 
the genes tested. Curve @ displays data with a maximum 
lod score value (Zmax) of about 4.0 at 6 = 0.23, suggesting 
the two genes are separated by 23 cM. The lod scores are 
significantly positive in the range of 18 to 30 map units. 
The curve provides significant evidence against genetic 
linkage at 0 < 0.5. Curve @ results from there being very 
little genetic linkage information, and its lod scores are 
inconclusive at all distances. Curve © rejects genetic link- 
age at @ values less than 0.12 but is inconclusive through 
the rest of the linkage range. 


Lod Scores at Recombination (8) Values 


0.10 0.20 0.30  Zmax Omax 
3.81 7.30 6.65 7.42 0.23 
12.08 12.55 9.17 13.02 0.16 
21.33 20.15 1479 21.68 0.13 
8.70 9.13 6.76 9.45 0.16 
-2.73 6.32 7.50 7.67 0.27 


A number of more comprehensive software programs 
permitting multipoint linkage analysis have been devel- 
oped to simultaneously analyze genetic linkage data for 
multiple genes and genetic markers. Multipoint linkage 
analysis tests all possible gene orders to identify the most 
likely order of linked genes. Experimental Insight 5.1 dis- 
cusses the application of lod score analysis in the mapping 
of BRCA1, a gene whose mutation can increase suscep- 
tibility to breast and ovarian cancer in women. Genetic 
Analysis 5.3 guides you through the interpretation of lod 
score values for linkage between a disease-causing gene 
and a linked DNA genetic marker. 


GENETIC ANALYSIS 


PROBLEM Ina study of human families with an autosomal dominant disease caused by a gene whose 
location is unknown, geneticists use lod score analysis to test linkage between the disease gene and a 
BREAK IT DOWNsThelodscoreis )// variable DNA genetic marker. Provide a complete interpretation of the lod score 
a statistical value that allows identifica-_ / data displayed in the following table, and identify the most likely distance between 


tion of the most likely recombination dis- h k h " 
tance between genes and, by extension, the marker gene and the disease gene. 


BREAK IT DOWN: Lod score values 


rejection of linkage (pp. 167-168). greater than +3.0 indicate statistically significant 
0 Value evidence in favor of genetic linkage, and values 
e eee less than —2.0 significant evidence against 
0.0 0.01 0.02 0.03 0.04 0.05 0.06 0.08 0.10 0.15 0.20 0.30 0.40 0.50 linkage at specified @ values (p. 168). 


—æ —6.95 —1.10 0.20 1.22 2.25 7.23 7.02 5.11 4.23 -2.01 -6.84 -9.91 0.0 


Solution Strategies Solution Steps 

Evaluate 

1. Identify the topic of this problem 1. This problem concerns lod score analysis assessing genetic linkage between 
and the nature of the required a variable DNA genetic marker and a gene carrying a dominant mutation pro- 
answer. ducing a disease. The answer requires interpretation of the lod score values, 


identification of potential genetic linkage, and determination of the most likely 
distance between the DNA marker gene and the disease gene. 


2. Identify the critical information 2. Lod score values are given for 14 8 values (map units between genes). 
given in the problem. 


TIP: Survey the entire lod score table to identify 
significant and nonsignificant lod score values. 


Deduce 
3. Identify significant lod score 3. Significant evidence against genetic linkage occurs at 9 = 0.01 and at 0 = 0.20. 
values in the lod score table and Conversely, significant results in favor of genetic linkage are seen at 9 = 0.06 to 
locate Zax. 6 = 0.15. The Zmax value is 7.23 and corresponds to 0 = 0.06 (6 m.u.). 
Solve 
4. Interpret the meaning of the 4. The data support genetic linkage between the marker gene and the disease 
lod scores for genetic linkage. gene at recombination distances of between 6 m.u. and 15 m.u. Linkage 
between the genes is rejected at less than 2 m.u. and at more than 20 m.u. 
ee erga The lod score results between 2 m.u. and 5 m.u. are inconclusive. 
to significant lod score values. 
5. Identify the most likely distance 5. The Zmax value is 7.23 at 0 = 0.06, thus identifying the most likely distance 
between the DNA between the disease gene and the marker gene as 6 m.u. 
marker gene and TIP: The maximum lod score value corresponds 


the disease gene. to a specific distance between genes that is 
identified by its O value. 


For more practice, see Problems 18, 28, and 29. Visit the Study Area to access study tools. MasteringGenetics™ 


5.6 Recombination Affects Evolution In comparison to vegetative propagation, such as that 
nd Genetic Diversit seen in yeast, independent assortment during sexual repro- 
a eners ersity duction provides one mechanism for genetic diversification. 


Recall, for example, that independent assortment of your 
23 pairs of homologous chromosomes can generate well 
over 8 million genetically different gametes. Recombination 
between homologous chromosomes adds substantially to 
this number by reshuffling the alleles carried on parental 
chromosomes, thus producing much more genetic diversity 
than would be possible by independent assortment alone. 
Experimental evidence supports the idea that 
homologous recombination is a potent factor in evolu- 
tion and that recombination is favored by natural selec- 
tion. A meta-analysis study by Sarah Otto and Thomas 


Recombination between homologous chromosomes is a 
potent evolutionary factor. It is so strongly favored by 
evolution that it is essentially ubiquitous in eukaryotes. 
Recombination is a companion of sexual reproduction as 
an evolutionary hallmark in eukaryotes because it provides 
a mechanism for generating genetic diversity among off- 
spring. From an evolutionary perspective, genetic diversity 
increases the chance that organisms will survive and repro- 
duce in changing environments, and it enhances the ability 
of organisms to adapt to new environmental niches previ- 
ously unoccupied by the species. 
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Lenormand in 2002 examined recombination rates in 
a large number of artificial selection experiments con- 
ducted by other researchers who were studying the evolu- 
tion of traits that were unrelated to sex or recombination. 
Otto and Lenormand determined that in the majority of 
cases, the rate of recombination had increased signifi- 
cantly as a result of the application of artificial selection to 
a trait. This result indicates that evolution is enhanced by 
the occurrence of recombination and that recombination 
rates increase in response to evolution. 

Recombination has a second evolutionary effect, this 
one operating at the level of populations. As popula- 
tions age, one would expect recombination to randomize 
the combinations of alleles on chromosomes. When this 
expected randomization does not occur, evolution is fre- 
quently the cause. The specific array of alleles in a set of 
linked genes on a single chromosome is called a haplotype 
(a contraction of “haploid genotype”). Because the alleles 
in a haplotype belong to linked genes, they tend to be 
passed together during meiosis. Homologous chromo- 
somes carried by an organism can contain different haplo- 
types. Haplotypes can consist of any combination of linked 
genes producing molecular genetic variation—SNPs, for 
example—or morphological variation. Haplotypes that are 
defined by SNP loci usually span regions of 10,000 to 
100,000 base pairs, whereas haplotypes for genes pro- 
ducing morphological variation tend to be much larger, 
spanning up to several million base pairs. Using letters 
A through F to specify linked SNP loci, and primed (’) 
and unprimed letters to distinguish the alleles of these 
sequences, we can specify two sample haplotypes for the 
same region on homologous chromosomes as 


wA BG DEF"... 
AB’ CD'E'E.. 


Over multiple generations, crossing over is expected to occur 
between the original haplotypes to produce new haplotypes 
that occur at frequencies determined by chance. In other 
words, for genes in a population, the genotype for a chromo- 
some at one gene is expected to be independent of its geno- 
types for other genes. When this occurs, the chromosome 
region is said to be in linkage equilibrium. This means that 
knowing the alleles at one gene does not help predict the al- 
leles present at other genes on the chromosome. 

As an example, let’s consider two SNP genes A and B 
in the haplotypes above. Assuming that the frequencies 
of alleles at SNP A are A = 0.70 and A’ = 0.30 and at 
SNP B are B = 0.20 and B’ = 0.80, we can use chance 
to predict haplotypes. For the A SNP and the B SNP, the 
predicted haplotypes and frequencies are 


A'B’ = (0.30)(0.80) = 0.24 
A'B = (0.30)(0.20) = 0.06 
A B' = (0.70)(0.80) = 0.56 
AB = (0.70)(0.20) = 0.14 

= 1.00 


When linkage equilibrium is not observed, the fre- 
quencies of certain haplotypes in a population deviate sig- 
nificantly from the frequencies expected. This situation is 
identified as linkage disequilibrium, and it frequently oc- 
curs as a consequence of evolutionary processes operating 
on a population. Two different evolutionary processes are 
common causes of linkage disequilibrium. (1) Migration 
can produce linkage disequilibrium if haplotypes have 
been recently introduced into a population and there has 
not been a sufficient number of generations for cross- 
ing over to randomize alleles. (2) If one specific allele in 
a haplotype is favored by natural selection, the allele will 
increase in frequency in the population. The other alleles 
in the haplotype will also be favored because of their close 
proximity to the favored allele. Recombination is con- 
stantly reshuffling the alleles on chromosomes so that over 
multiple generations an allele favored by natural selection 
will be part of different multilocus genotypes, but in the 
short term, linkage disequilibrium can be observed as the 
result of natural selection on one allele in a haplotype. 
Recombination eventually randomizes the alleles in hap- 
lotypes containing an allele favored by natural selection to 
eliminate linkage disequilibrium, but the number of gen- 
erations required is determined by the strength of natural 
selection and the distances between linked genes. 


5.7 Genetic Linkage in Haploid 
Eukaryotes Is Identified 
by Tetrad Analysis 


The genetic mapping experiments conducted in maize, 
Drosophila, humans, and other diploid organisms have 
allowed biologists to develop extensive genetic maps for 
many species. They are a triumph of scientific reasoning 
and the careful execution of experimental design. As suc- 
cessful as these experiments have been, however, certain 
other organisms have life cycles that allow the genotypes 
of individual gametes to be studied more directly, without 
requiring interpretation of the expression of traits among 
the progeny of controlled crosses. For this research, genet- 
icists depend on eukaryotic microorganisms such as the 
class Ascomycetes that includes bread mold (Neurospora 
crassa) and yeast (Saccharomyces cerevisiae). 

Ascomycetes species spend most of their life cycle 
in a haploid state, dividing by mitosis to produce new 
cells. For example, haploid yeast cells of Saccharomyces 
cerevisiae undergo mitotic division during the vegetative 
portion of the life cycle, reproducing new haploid cells 
that bud off from parental cells (Figure 5.18). Diploid 
yeast form by the union of two genetically different hap- 
loid mating types. The diploid yeast cells undergo meio- 
sis, producing four haploid ascospores contained within 
a saclike structure called an ascus. The four ascospores 
in an ascus are called a tetrad. In yeast, the ascospores 
are not arranged in any particular order, so the structure 
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Figure 5.18 The life cycle of yeast Saccharomyces cerevisiae. 
@ Haploid yeast grow by vegetative propagation. © Yeast of 
different mating types can fuse to produce diploids. @ Haploid 
ascospores are produced by meiosis in diploid yeast. @ Diploid 
strains propagate by vegetative growth. 


is called an unordered tetrad. Within each tetrad, two 
of the ascospores are of the a mating type and two are 
of the a mating type. At maturity, the ascus ruptures in 
an event known as sporulation, and spores are released 
to grow as haploids. In laboratory studies, mature as- 
cospores can be removed from their ascus and grown 
as haploids in culture to discover their genotypes. This 
process is called tetrad analysis. 


Analysis of Unordered Tetrads 


Suppose a dihybrid yeast cell with the genotype a*ab*b 
is produced by fusing two haploid cells with genotypes 
a‘b* and ab. If the genes are on different chromosomes, 
two equally likely arrangements of chromosomes occur 
in metaphase I, labeled “Alternative I” and “Alternative 


II” in Figure 5.19a. If no crossover occurs between homo- 
logs, each tetrad contains ascospores with two genotypes. 
Ascospores produced by the Alternative I arrangement 
of metaphase chromosomes contain the same alleles as 
were found in the parental haploids (a‘b* and ab, in this 
case). Tetrads with these two ascospore genotypes are 
known as parental ditypes (PD). Tetrads that undergo 
the Alternative II metaphase chromosome arrangement 
produce ascospores that have different genotypes than 
the parents. These tetrads are called nonparental di- 
types (NPD). If crossing over occurs between either of 
the homologous chromosome pairs, the tetrad contains 
ascospores with four different genotypes and is known as 
a tetratype (TT) (Figure 5.19b). 

Now let’s consider what is observed when the genes 
are linked. In Figures 5.10 and 5.11, we saw that several 
types of single and double crossover can occur between ho- 
mologous chromosomes in diploids; Figure 5.20 illustrates 
the tetrad combinations that result from no crossover and 
from single and various double crossovers between a pair 
of homologous chromosomes carrying alleles a*b*/ab at 
linked loci. The figure illustrates that for these linked 
genes, all three tetrad types form, but PD and TT tetrads 
are each more frequent than NPD. PD tetrads are most 
common, being produced when no crossover occurs be- 
tween genes and when two-strand double crossover takes 
place. TT tetrads are less frequent than PD, occurring 
when single crossovers or three-strand double crossovers 
take place. NPD tetrads are least frequent, forming only 
when four-strand double crossover occurs. Genetic linkage 
produces the tetrad expectation PD > TT > NPD. 

Genetic linkage analysis in tetrads is based on the 
relative frequencies of different tetrad types rather than 
an assessment of individual progeny. The formula used to 
determine recombination frequency in tetrad analysis (fa- 
miliar from our previous assessments of genetic linkage) is 


number of recombinants X (100) 


total number of progeny 


An example of this analysis comes from a study that 
examined tetrads produced by fusion of haploid strains 
pdx pan’ X pdx*pan. The data in Table 5.5 show that 
among 49 tetrads analyzed, 28 are PD, 20 are TT, and 1 
is NPD. A close examination of Figure 5.20 reveals that 
in tetrads, recombinant chromosomes are found in one- 
half the ascospores of TT tetrads and all the ascospores 
of NPD tetrads. On this basis, tetrad recombination fre- 
quency is determined using 


($TT) + NPD 
total tetrads 


Recombination frequency for this example is therefore 


_ [G)(20) +1] 
49 


= 0.224( 22.4%) 


(a) No crossover 
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(b) One crossover 


Independent assortment of chromosome A and chromosome B produces PD and 
NPD tetrads containing a total of 50% parental and 50% recombinant gametes. 


Single crossover of chromosome A 
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Crossover between one homologous pair of chromosomes produces TT tetrads 
containing a total of 50% parental and 50% recombinant gametes. 


Figure 5.19 Tetrad results for unlinked genes. (a) Parental ditype (PD) and nonparental ditype 
(NPD) tetrads are the products of segregation and independent assortment. Each ascus contains two 
genetically different types of ascospore. (b) Single crossovers between either homologous pair of 
chromosomes produce tetratype (TT) tetrads that contain four genetically different ascospores. 


Ordered Ascus Analysis 


Fungi such as Neurospora crassa follow the same ba- 
sic haploid—diploid life cycle as yeast but produce an 
ascus with eight haploid ascospores rather than four. 
In Neurospora, the fusion of two haploid fungi forms 
a diploid meiocyte that undergoes meiotic divisions to 
generate four haploid products aligned in a tetrad as- 
cus. Mitotic division of the ascospores immediately fol- 
lows completion of meiosis, forming an eight-member 


ascus (Figure 5.21). The two members of each mitotically 
produced pair of daughter spores are adjacent to one 
another in the Neurospora octad, and the octad is called 
an ordered ascus. Consequently, the arrangement of 
daughter spores reflects the identity and orientation of 
the alleles carried by each chromatid in metaphase I. An 
ordered ascus can be dissected before sporulation, and 
haploid spores can be removed one by one to determine 
their genotype. In this way, each product of meiosis is 


Figure 5.20 Tetrad formation (a) No crossover 
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crossover produces the nonparental 
ditype. 
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(c) Double crossover (two-strand) 
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Table 5.5 Recombination Calculation in Tetrads 


Genotype: pdx pan‘ /pdx* pan 


Tetrad Types 

PD TT NPD 
Ascospore pdx pan* pdx pan* pdx pan 
genotypes pdx pan* pdx pan pdx pan 

pdx* pan pdx* pan pdx pan* 

pdx* pan pdx* pan pdx* pan* 
Number 28 20 1 =49 
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identified, and its spatial relationship to other meiotic 
products is determined. 

Ordered ascus analysis can be used to map the dis- 
tance between linked genes and the position of a gene 
relative to the centromere of its chromosome. Gene-to- 
centromere distance is calculated based on the segrega- 
tion of homologous chromosomes in meiosis I and of 
sister chromatids in meiosis II. In an a'a meiocyte in 
which no crossover occurs between the gene and the 
centromere, alleles segregate in meiosis I. Completion 
of meiosis and the mitotic division produces an ordered 
ascus with four spores of one type grouped in the top half 
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Figure 5.21 Ordered ascus 
production in the fungus 
Neurospora crassa. 


Two haploid cells 
fuse to form a 
diploid meiocyte. 


Meiosis separates homologous 
chromosomes and chromatids, 
forming haploids in a tetrad. 


Mitosis produces 
an eight-member 
ordered ascus. 


Sporulation 
releases spores. 


of the ascus and spores of the other type filling the bot- 
tom half (Figure 5.22). This pattern of segregation is called 
first-division segregation, to signify the separation of 
alleles a‘ and a in the first meiotic division. In the absence 
of crossover, none of the spores in first-division segrega- 
tion asci are recombinant. 

If crossover takes place, alleles a’ and a are not 
separated until the second meiotic division, a pattern 
called second-division segregation. If crossover occurs 
between the gene and centromere, a single crossover pro- 
duces one of four different octad patterns, depending on 
the orientation of chromatids during meiosis. One exam- 
ple is illustrated in Figure 5.23a where the ordered ascus 
has a 2:2:2:2 ratio. Alternative chromosome orientations 
accompanied by single crossover produce three additional 
ordered ascus patterns that group identical mitotic prod- 
ucts next to one another (Figure 5.23b). In each case, the 
overall 1:1 ratio of the two alleles is seen among the eight 
ascospores—only the order of spores differs. The relative 
proportion of second-division segregation asci is used to 
calculate the map distance (in centiMorgans) between a 
gene and the centromere via the formula 


1 ses + p 
z( number of second-division segregation asci ) 


xcM = x 100 


total number of asci 


a’ 
—- a 
a ae 
Meiosis | 


a 
oS a 
a ek 


This calculation is equivalent to counting the number of 
recombinant spores and dividing by the total number of 
progeny, because one-half the spores in second-division 
segregation asci are recombinant. Figure 5.24 provides an 
example using Neurospora crassa. Wild-type fungi that 
grow as buff-colored colonies with normal growth habit 
are mated to mutants that grow as orange colonies with 
fluffy growth habit. As computed in the figure, the dis- 
tance from the centromere to the color gene is 16.5 cM, 
and the distance from the centromere to the growth-habit 
gene is 30.7 cM. 


5.8 Mitotic Crossover Produces 
Distinctive Phenotypes 


Our discussion of crossing over and recombination has 
been limited to events that occur during meiosis. You 
may have wondered whether crossing over occurs during 
mitosis, and if so, what its consequences are. Synapsis of 
homologous chromosomes during mitosis occurs only 
occasionally in animals; thus, there is little opportunity 
for recombination to occur. In certain cases, however, ho- 
mologous recombination does occur during mitosis. The 
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segregation in ordered ascus 
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Figure 5.23 Second-division segregation in ordered ascus formation. (a) This single crossover 
produces a 2:2:2:2 ordered ascus. (b) Different outcomes of second-division segregation can occur, 


depending on the chromatids involved in crossing over. 


rate of mitotic crossover varies considerably among or- 
ganisms, but its consequences have been revealed through 
some fascinating examples. 

The first well-documented example of mitotic cross- 
over came in 1936, when Curt Stern studied Drosophila 
crosses of two X-linked recessive traits, yellow body 
color (y) and short, twisted bristles called singed (sn). 
Stern crossed females homozygous for wild-type (gray) 
body color and singed bristles (y* su/y* sn) with yellow- 
bodied, normal-bristled males (y sn*/Y) and obtained 
dihybrid F, females that had wild-type body color and 
bristle form (y* sn/y sn”). Close examination of a small 
number of F, females revealed an unexpected pheno- 
type. These females had wild-type body color and wild- 
type bristles over most of the body but had small patches 
of either yellow body color or singed bristles. Even more 
surprising, some females had a patch of yellow body and 
a patch of singed bristles, and when they did, the patches 
were always adjacent to one another in a pattern called 
a twin spot (Figure 5.25). Among these three unusual 
spotting patterns, twin spot was about twice as common 
as single yellow spotting and single yellow spotting was 
much more common than single singed spotting. 


In formulating an explanation for the odd patches 
and their different frequencies, Stern reasoned that since 
the twin spots were always side by side, they must result 
from reciprocal events. He realized that rare crossover 
between homologous chromosomes during mitosis could 
explain twin spots, and it could also be a source of both 
kinds of single spots as well. Stern proposed that mitotic 
crossover events like those illustrated in Figure 5.25 were 
responsible for single and twin spots in Drosophila. Twin 
spotting is explained by mitotic crossover between sn and 
the centromere if the particular pattern of chromosome 
segregation illustrated in Figure 5.25 takes place. Mitotic 
crossover between y and sv followed by the chromosome 
segregation shown produces single yellow spots. The 
double crossover and chromosome segregation pattern 
shown are required to produce single singed spot. Twin 
spotting is the most common observation because the 
map distance between sn and the centromere is 45 cM. In 
contrast, the distance between y and sn is 21 cM, so twin 
spotting is about twice as common as single yellow spot. 
The double crossover producing single singed spot is less 
frequent than either single crossover, thus single singed 
spot is the least frequent phenotype. 


Distance from 
Centromere to Trait 


First Second D? 
Division Division Combined 7] x 100 = cM 
P (genotype) F1 (genotype) Trait (D1) (D2) (D1 + D2) [D1 + D2] Gene Map 
[3] 30.7 cM 
Cg C*g*/cg Color (c) 73 36 109 —— x 100 = 16.5 
[109] 16.5 cM 
l 
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cg Growth (g) 42 67 109 [109] X 100 = 30.7 c g 


Figure 5.24 Calculation of centromere-to-gene distance in Neurospora crassa. 
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Figure 5.25 Mitotic crossover. In Drosophila crosses analyzed by Curt Stern, twin spots @, single 
yellow spot @, and single singed spot © were produced by mitotic crossing over followed by a partic- 
ular segregation pattern during mitotic cell division. In each set of diagrams, the chromatids and their 
centromeres are first numbered prior to crossing over. The numbers used after crossing over show the 
segregation patterns that produce the identified mitotic crossover phenotypes. 


CASE STUDY 
Mapping the Gene for Cystic Fibrosis 


Cystic fibrosis (CF) (OMIM 219700) is an autosomal recessive 
disorder caused by a defect in the cystic fibrosis transmem- 
brane conductance regulator (CFTR) gene that is located on 
chromosome 7 in humans. The protein product of CFTR spans 
the membrane of cells, regulating the flow of chloride ions in 
and out of the cell. Mutations of CFTR primarily affect glands 
producing mucus, digestive enzymes, and sweat. 

First identified in the late 1930s, CF proved to be a rela- 
tively common disorder, particularly in Caucasian populations, 


where it occurs at a frequency of 1 in 2500 infants, according 
to the American Lung Association. It is much less common 
in Hispanics (1 in 15,000), African Americans (1 in 30,000), 
and Native Pacific Islanders (1 in 100,000). In Caucasians, the 
frequency of heterozygous carriers of the recessive allele is 
approximately 4%. Numerous family studies identified CF as 
being caused by mutation of a single gene, although the gene 
was not identified until the 1980s. Many mutant alleles of the 
gene are known, although one mutation is very common. 
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The principal clinical difficulty in CF is very thick mucus 
that clogs the airways in the lungs and in the ducts that trans- 
port digestive enzymes from the pancreas to the small intes- 
tine. Chronic and severe respiratory infections are a hallmark 
of CF, as are digestive difficulties that can result in chronic mal- 
nutrition, even with adequate food intake. Awareness of the 
principal complications of CF has led to better management 
and improved survival. In the 1950s, CF patients rarely sur- 
vived long enough to enter elementary school. By 1985, the 
average age of survival stood at about 25 years. By 2007, mean 
survival had improved to approximately 28 years. CF patients 
with less severe forms of the disease survive even longer. 

With family studies indicating that a single autosomal 
gene was responsible for CF, researchers used genetic linkage 
mapping and lod score analysis to locate the CF gene. All 22 
autosomes were studied, and initially a great deal of negative 
genetic linkage information was obtained. These data identi- 
fied chromosomes where the gene was not located. The first 
important piece of positive gene mapping evidence came in 
1985 when Hans Eiberg and colleagues identified the close 
linkage of the CF gene to the PON gene that produces the 
blood serum enzyme paraoxonase. Unfortunately, PON did 
not have a known chromosome location at the time, so de- 
spite the finding that the CF gene was near PON, the identity 
of the chromosome carrying the genes remained a mystery. 

A few months later, however, Lap-Chee Tsui and col- 
leagues identified a DNA RFLP marker known as D7S75 that 
was linked to both the CF gene and to PON (see Section 5.5). 
D7S15 was known to reside near the middle of the long arm of 
chromosome 7. Like almost all RFLPs, D7S75 is not part of an 
expressed gene, and it has nothing to do with causing CF. It is 
merely a DNA sequence variant that is detected in a noncoding 
segment of chromosome 7. As Table 5.6 shows, however, lod 
score values for D7S15—CF and D7S15-PON linkage as reported 
by Tsui et al. (1985) for 39 families with CF clearly demonstrated 
close genetic linkage between the genes and the RFLP. Lod 
score values greater than +3.0 are seen for D7S15-CF linkage 


Table 5.6 


in the range 8 = 0.10 to 0.20, with ama, value of 3.96 at 0 = 0.14. 
For the D7S15-PON analysis, significantly positive lod scores 
are seen in the range 8 = 0.01 to 0.20, with a Za, value of 5.01 
at 0 =0.05. Taken together, the lod score analysis indicated the 
order D7S15-PON-CF with a distance of approximately 5 cM 
from D7S715 to PON and 14 cM from PON to CF. 

With the segment of chromosome 7 containing the CF 
gene identified, researchers examined the chromosome 7 re- 
gion and quickly found additional DNA genetic markers that 
were linked even more closely to the CF gene. Using these 
markers, they identified a segment of about 500,000 bp of 
DNA as the likely location of the CF gene. By examining DNA 
sequences for the probable presence of expressed genes and 
by testing for the presence of genes that were known to be 
expressed in sweat glands, a group of investigators led by Tsui 
and Francis Collins cloned and sequenced the CF gene in 1989. 
Investigators quickly determined that the protein product of 
the CF gene is a transmembrane conductance regulatory pro- 
tein, at which point the gene acquired its CFTR designation. 

One mutation known to delete three consecutive DNA 
base pairs and alter one amino acid of the CFTR protein 
accounts for almost 50% of the known CFTR mutant al- 
leles. Numerous other CFTR mutant alleles have also been 
identified, but none of these has a frequency of more than 
a few percent. The various CFTR mutant alleles produce dif- 
ferent levels of functionality in the transmembrane protein, 
to some extent allowing clinical variation in CF patients to 
be attributed to particular mutant alleles. Knowing the fre- 
quency of the one common mutation and having identified 
many other CFTR mutations, medical geneticists are able to 
offer prenatal genetic testing to CF families and are able to 
accurately identify the mutant alleles and probable disease 
severity in patients. 

The process of first mapping, then cloning, then sequenc- 
ing CFTR to identify its function is a genetic strategy known as 
positional cloning or reverse genetic analysis. We discuss this 
investigative strategy more completely in Chapter 16. 


Linkage Data from 39 Families with Cystic Fibrosis 


Lod Scores at Various Recombination Distances (8) 


Marker—Gene 0.01 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 
D7S15-CF —5.88 1.67 3.63 3.95 3.62 2.97; 2.18 1.38 0.67 
D7S15-PON 4.27 5.01 4.78 4.28 3.66 2.97 225 1.51 0.81 
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William Bateson and Reginald Punnett first observed genetic 
linkage when they noticed high numbers of parental pheno- 
types in F, progeny. 

Thomas Hunt Morgan performed test-cross analysis of 
linked genes to demonstrate that linkage violates indepen- 
dent assortment and that crossover between homologous 
chromosomes is responsible for the production of recombi- 
nant gametes. 


5.1 Linked Genes Do Not Assort Independently 


Genetic linkage identifies genes that are so close to one 
another on a chromosome that their alleles do not assort 
independently. 

With genetic linkage, parental combinations occur at fre- 
quencies that are significantly greater than those predicted 
by chance, and nonparental combinations are much less fre- 
quent than expected. 


f Crossover frequency between linked genes is correlated with 
the distance between genes on a chromosome. Crossover 
occurs less often between genes that are close together than 
between genes that are farther apart. 


| In crosses involving linked genes, the two parental phe- 
notypes are observed in progeny in approximately equal 
frequencies. The two recombinant phenotypes also occur at 
approximately equal frequency. 


5.2 Genetic Linkage Mapping Is Based on 
Recombination Frequency between Genes 


f The correlation between physical map distance and 
recombination frequency permits gene mapping based on 
recombination frequency. 


5.3 Three-Point Test-Cross Analysis Maps Genes 


f Three or more genes can be mapped by test-cross analysis. 
Ina three-point cross, parental phenotypes are most fre- 
quent, double recombinants are least frequent, and the four 
phenotypes resulting from two single-recombination events 
are of intermediate frequency that depends on the actual dis- 
tance between genes. 


| Genetic linkage maps are constructed in five steps: 


1. Find significantly higher proportions of parental pheno- 
types than predicted by chance. 

2. Identify the alleles on parental chromosomes (the most 
common classes). 

3. Identify double recombinants (the least frequent classes), 
comparing them to parental chromosomes to determine 
gene order. 

4. Calculate recombination frequencies between genes. 

5. Calculate interference with the occurrence of double 
crossovers. 


f Recombination frequency usually underestimates the physi- 
cal distance between genes. Mapping functions are used to 
correct these estimates. 

Hotspots and coldspots of recombination are found in many 
genomes, reflecting the uneven distribution of homologous 
recombination. 


5.4 Recombination Results from Crossing Over 


f Studies correlating genetic recombination with the vis- 
ible recombination of distinctive physical structures on 
chromosomes support the idea that crossing over causes 
recombination. 
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E Crossing over occurs at the four-strand stage in prophase 
I of meiosis, after completion of DNA replication. Two 
nonsister chromatids of homologous chromosomes 
exchange parts in two-strand single crossovers. Two, 
three, or all four chromatids can be involved in double 
crossovers. 

Recombination occurs within genes as well as between 
genes. Several biological properties of organisms af- 
fect recombination. In animals, the heterogametic sex 
experiences less recombination genome-wide than the 
homogametic sex. 


5.5 Linked Human Genes Are Mapped Using 
Lod Score Analysis 


| Statistical approaches such as lod score analysis detect 
evidence of linkage in small families. 
Lod score analysis determines the likelihood of genetic 
linkage between genes at specified recombination values (0 
values). A cumulative lod score of +3.0 or more is statisti- 
cally significant evidence in favor of genetic linkage between 
two genes. Lod scores of —2.0 or less represent significant 
evidence against genetic linkage. 


5.6 Recombination Affects Evolution 
and Genetic Diversity 


Recombination between homologs adds substantially 
to the genetic diversity produced through sexual 
reproduction. 

E Homologous recombination helps break down linkage 
disequilibrium to randomize the alleles of linked 
genes. 


5.7 Genetic Linkage in Haploid Eukaryotes 
Is Identified by Tetrad Analysis 


In certain eukaryotic microorganisms, the products of indi- 
vidual meiotic cell divisions are contained within an ascus. 
Parental and recombinant gametes contained in an ascus can 
be analyzed to map genes. 


5.8 Mitotic Crossover Produces Distinctive 
Phenotypes 


Mitotic crossing over is a rare event that produces patches of 
tissue with unusual phenotype. 
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Chapter Concepts 


For answers to selected even-numbered problems, see Appendix: Answers. 


1. For parts a, b, and c, draw a diagram illustrating the alleles 
on homologous chromosomes for the following genotypes, 
assuming in each case that the genes reside on the same 
chromosome in the order written. For parts d and e, give 
the information requested. 


that produces cut-leaf shape and a recessive allele c that 

produces potato-shaped leaf. 

a. The cross of a purple, hairy, cut plant heterozygous at 
each gene to a green, hairless, potato plant produces the 
following progeny: 


a. AB/ab = 3 
b. aBc/abC enotype Frequency, % 
c. DFg/DFG Purple, hairy, cut 21 
d. the P produced by an organism with the geno- Purple, hairy, potato 21 
type Rt/r _———S 
e. progeny of the cross Rt/rT X rt/rt Geen hair less, oe z all —— 
2. Ina diploid species of plant, the genes for plant height and Green, hairless, potato 2i 


fruit shape are syntenic and separated by 18 m.u. Allele D Purple, hairless, cut 4 
produces tall plants and is dominant to d for short plants, Gude Renee Fae — 
and allele R produces round fruit and is dominant to r for Se — 
oval fruit. Green, hairy, cut Al 
a. A plant with the genotype DR/dr produces gametes. Green, hairy, potato a 
Identify gamete genotypes, label parental and recom- 100 
binant gametes, and give the frequency of each gamete 
genotype. Give the genotypes of parental and progeny plants in 
b. Give the same information for a plant with the genotype this experiment. 
Dr/dR. b. Fully explain the number and frequency of each pheno- 
3. A pure-breeding tall plant producing oval fruit as described type class. 
in Problem 2 is crossed to a pure-breeding short plant pro- 6. In Drosophila, the map positions of genes are given in map 


units numbering from one end of a chromosome to the 
other. The X chromosome of Drosophila is 66 m.u. long. 
The X-linked gene for body color—with two alleles, y* for 
gray body and y for yellow body—resides at one end of 
the chromosome at map position 0.0. A nearby locus for 
eye color, with alleles w* for red eye and w for white eye, 
is located at map position 1.5. A third X-linked gene, con- 
trolling bristle form, with f* for normal bristles and f for 
forked bristles, is located at map position 56.7. Each gene 
resides on the X chromosome, and at each locus the wild- 
type allele is dominant over the mutant allele. 


ducing round fruit. 

a. The F; are crossed to short plants producing oval 
fruit. What are the expected proportions of progeny 
phenotypes? 

b. Ifthe F; identified in part (a) are crossed to one another, 
what proportion of the F, are expected to be short and 
produce round fruit? What proportion are expected to 
be tall and produce round fruit? 


4. Genes E and H are syntenic in an experimental organism 
with the genotype EH/eh. Assume that during each meio- 
sis, one crossover occurs between these genes. No homolo- 
gous chromosomes escape crossover, and none undergo 
double crossover. Are genes E and H genetically linked? 
Why or why not? What is the proportion of parental gam- 


a. Ina cross involving these three X-linked genes, do you 
expect any gene pair(s) to show genetic linkage? Explain 
your reasoning. 


etes produced by meiosis? 


b. Do you expect any of these gene pair(s) to assort inde- 


pendently? Explain your reasoning. 


5. In tomato plants, purple leaf color is controlled by a c. A wild-type female fruit fly with the genotype y*w'‘f/ 
dominant allele A, and green leaf by a recessive allele a. ywf * is crossed to a male fruit fly that has yellow body, 
At another locus, hairy leaf H is dominant to hairless leaf white eye, and forked bristles. Predict the frequency of 
h. The genes for leaf color and leaf texture are separated each progeny phenotype class produced by this mating. 
by 16 m.u. on chromosome 5. On chromosome 4, a gene d. Explain how each of the predicted progeny classes is 


controlling leaf shape has two alleles: a dominant allele C 


produced. 


Genes A, B, and C are linked on a chromosome and 
found in the order A-B-C. Genes A and B recombine with 
a frequency of 8%, and genes B and C recombine at a 
frequency of 24%. For the cross a*b*c/abc* X abc/abc, 
predict the frequency of progeny genotypes. Assume 
interference is zero. 


Gene G recombines with gene T at a frequency of 7%, and 

gene G recombines with gene R at a frequency of 4%. 

a. Draw two possible genetic maps for these three genes, 
and identify the recombination frequencies predicted 
for each map. 

b. Assuming any desired genotype is available, pro- 
pose a genetic cross whose result could be used to 
determine which of the proposed genetic maps is 
correct. 


Genes A, B, C, D, and E are linked on a chromosome 

and occur in the order given. The test cross Ae/aE X 

ae/ae indicates the genes recombine with a frequency 

of 28%. 

a. If 1000 progeny are produced by the test cross, deter- 
mine the number of progeny in each outcome class. 

b. Previous genetic linkage crosses have determined that re- 
combination frequencies for these genes are 6% for genes 
A and B, 4% for genes B and C, 10% for genes C and D, 
and 11% for genes D and E. The sum of these frequencies 
between genes A and E is 31%. Why does the recombina- 
tion distance between these genes, determined by adding 


Application and Integration 


13. Researchers cross a corn plant that is pure-breeding for 


the dominant traits colored aleurone (C1), full kernel (Sh), 
and waxy endosperm (Wx) to a pure-breeding plant with 
the recessive traits colorless aleurone (c1), shrunken kernel 
(sh), and starchy (wx). The resulting F; plants were crossed 
to pure-breeding colorless, shrunken, starchy plants. 
Counting the kernels from about 30 ears of corn yields the 
following data. 


Kernel Phenotype Number 
Colored, shrunken, starchy 116 
Colored, full, starchy 601 
Colored, full, waxy 2538 
Colored, shrunken, waxy 4 
Colorless, shrunken, starchy 2708 
Colorless, full, starchy 2 
Colorless, full, waxy 113 
Colorless, shrunken, waxy 626 
6708 


a. Why are these data consistent with genetic linkage 
among the three genes? 

b. Perform a chi-square test to determine if these data 
show significant deviation from the expected pheno- 
type distribution. 

c. What is the order of these genes in corn? 


10. 


11. 


12. 
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the intervals between adjacent linked genes, differ from 
the distance determined by the test cross? 


Syntenic genes can assort independently. Explain this 
observation. 


The recombination frequency between linked genes is less 
than 50%. Why is 50% recombination the maximum value? 


On the Drosophila X chromosome, the dominant allele y* 
produces gray body color and the recessive allele y produces 
yellow body. This gene is linked to one controlling full eye 
shape by a dominant allele /z* and lozenge eye shape with a 
recessive allele /z. These genes recombine with a frequency 
of approximately 28%. The Lz gene is linked to gene F con- 
trolling bristle form, where the dominant is long bristles 
and the recessive is forked bristles. The Lz and F genes re- 
combine with a frequency of approximately 32%. 


a. Using any genotypes you choose, design two separate 
crosses, one to test recombination between genes Y and 
Lzand the second between genes Lz and F. Assume 
1000 progeny are produced by each cross, and give the 
number of progeny in each outcome category. (In set- 
ting up your crosses, remember that Drosophila males 
do not undergo recombination.) 

b. Can any cross reveal genetic linkage between gene Y 
and gene F? Why or why not? 

c. Why is “independent assortment” the genetic term that 
best describes the observations of a genetic cross be- 
tween gene Y and gene F? 


For answers to selected even-numbered problems, see Appendix: Answers. 


14. 


d. Calculate the recombination fraction between the gene 
pairs. 
e. What is the interference value for this data set? 


Nail—patella syndrome is an autosomal disorder affecting 

the shape of nails on fingers and toes as well as the structure 
of kneecaps. The pedigree below shows the transmission of 
nail—patella syndrome in a family along with ABO blood type. 


A | 3 


>} > 


O 

A A A O 

a. Is nail—patella syndrome a dominant or a recessive 
condition? Explain your reasoning. 

b. Does this family give evidence of genetic linkage be- 
tween nail—patella syndrome and ABO blood group? 
Why or why not? 

c. Using N and x to represent alleles at the nail—patella lo- 
cus and 74, I”, and i to represent ABO alleles, write the 
genotypes of I-1 and I-2 as well as their five children in 
generation II. 

d. Explain why III-6 has nail—patella syndrome and III-8 
does not. Give genotypes for these two individuals. 

e. Explain why I-11 has nail—patella syndrome and III-12 
does not. Give genotypes for these two individuals. 
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15. 


16. 


17. 
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Three dominant traits of corn seedlings, tunicate seed 
(T-), glossy appearance (G-), and liguled stem (L-), are 
studied along with their recessive counterparts, nontuni- 
cate (tt), nonglossy (gg), and liguleless (//). A trihybrid plant 
with the three dominant traits is crossed to a nontunicate, 
nonglossy, liguleless plant. Kernels on ears of progeny 
plants are scored for the traits, with the following results: 


Phenotype Number 
Tunicate, glossy, liguled 102 
Tunicate, glossy, liguleless 106 
Tunicate, nonglossy, liguled 18 
Tunicate, nonglossy, liguleless 20 
Nontunicate, glossy, liguled 22 
Nontunicate, glossy, liguleless 23 m 
Nontunicate, nonglossy, liguled 99 
Nontunicate, nonglossy, liguleless 110 
500 


a. Is there evidence of genetic linkage among any of these 
gene pairs? If so, identify the evidence. 

b. Is there evidence of independent assortment among any 
of these gene pairs? If so, identify the evidence. 

c. Using the gene symbols given above, write the geno- 
types of F, and F, plants. 

d. If evidence of linkage is present, calculate the recombi- 
nation fraction(s) from the data presented. 

e. Could all three genes be carried on the same chromo- 
some? Discuss why or why not. 


In a diploid plant species, an F; with the genotype Gg LI Tt v 
is test-crossed to a pure-breeding recessive plant with the g 
genotype gg ll tt. The offspring genotypes are as follow 8 

Genotype Number 

Gg LITt 621 

Gg LI tt 3 

Gg lI Tt 64 

Gg II tt 109 

gg LITt 103 

gg Litt 67 

gg II Tt 7 

gg Il tt 626 

. 1600 


a. What is the order of these three linked genes? 

b. Calculate the recombination fractions between each 
pair of genes. 

c. Why is the recombination fraction for the outside pair 
of genes not equal to the sum of recombination frac- 
tions between the adjacent gene pairs? 

d. What is the interference value for this data set? 

e. Explain the meaning of this J value. 


The table given lists the arrangement of alleles of linked 
genes in dihybrid organisms, the recombination frequency 


between the genes, and specific gamete genotypes. Using the 
information provided, determine the expected frequency of 
gametes given. Assume one map unit equals 1% recombina- 
tion and, when three genes are involved, interference is zero. 


Dihybrid Recombination Gamete 
Genotype Frequency Genotype 
A. DE/de 8% De 
B. AD/ad 28% ad 
C. DEF/def E-F 24% def 

D-E 8% 
D. BdE/bDe B-D 18% Bde 

D-E 8% 


The Rh blood group in humans is determined by a gene 

on chromosome 1. A dominant allele produces Rh+ blood 
type, and a recessive allele generates Rh—. Elliptocytosis is 
an autosomal dominant disorder that produces abnormally 
shaped red blood cells that have a short life span resulting 
in hereditary anemia. A large family with elliptocytosis is 
tested for genetic linkage of Rh blood group and the dis- 
ease. The lod score data below are obtained for the family. 


T T T T T T T T T 8 Value 
0.05 0.1 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 


a. From these data, can you conclude that Rh and 
elliptocytosis loci are genetically linked in this family? 
Why or why not? 

b. What is Zmax for this family? 

c. Over what range of 0 do lod scores indicate significant 
evidence in favor of genetic linkage? 


Genetic linkage mapping for a large number of families 

identifies 4% recombination between the genes for Rh 

blood type and elliptocytosis. At the Rh locus, alleles R 

and r control Rh+ and Rh- blood types. Allele E produc- 

ing elliptocytosis is dominant to the wild-type recessive 

allele e. Tom and Terri each have elliptocytosis, and 

each is Rh+. Tom’s mother has elliptocytosis and 

is Rh— while his father is healthy and has Rh+. 

Terri’s father is Rh+ and has elliptocytosis; Terri’s 

mother is Rh— and is healthy. 

a. What is the probability that the first child of Tom and 
Terri will be Rh— and have elliptocytosis? 

b. What is the probability that a child of Tom and Terri 
who is Rh+ will have elliptocytosis? 


20. Neurospora with the genotype a’ a form tetrads in the 


21. 


22. 


23. 


following frequencies: 


Tetrad Number 
aaaa 192 
aada 208 
adaa 23 
aa‘a‘a 27 
d‘aa‘a 29 
adaaa 21 

= 500 


a. What is the distance between the gene and the 
centromere? 

b. Diagram the meiosis producing the tetrad class aa a*a”. 

c. Diagram the meiosis producing the tetrad class a*a a a”. 


Gene R and gene T are genetically linked. Answer the fol- 

lowing questions concerning a dihybrid organism with the 

genotype Rt/rT: 

a. Ifr = 0.20, give the expected frequencies of gametes 
produced by the dihybrid. 

b. Determine the gamete frequencies if a two-strand dou- 
ble crossover occurs between the genes. 

c. Determine the genotypes of gametes produced by 
a three-strand double crossover in this dihybrid 
organism. 

d. Determine the genotypes of gametes produced by a 
four-strand double crossover in this dihybrid. 


T. H. Morgan’s data on eye color and wing form, shown in 
Figures 5.3 and 5.5, reveal genetic linkage between the two 
genes. Test this genetic linkage data with chi-square analy- 
sis, and show that the results are significantly different 
from the expectation under the assumption of independent 
assortment. 


A wild-type trihybrid soybean plant is crossed to a pure- 
breeding soybean plant with the recessive phenotypes pale 
leaf (J), oval seed (r), and short height (t). The results of the 
three-point test cross are shown below. Traits not listed 
are wild type. 


Phenotype Number 
Pale 648 
Pale, oval ee 
Pale, short 10 
Pale, oval, short 102 
Oval 6 
Oval, short 618 
Short 84 
Wild type 98 
1630 


a. What are the alleles on each homologous chromosome of 
the parental wild-type trihybrid soybean plant? Place the 
alleles in their correct gene order. Use L, R, and T to rep- 
resent dominant alleles and J, r, and t for recessive alleles. 


24. 


25. 


26. 
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b. Calculate the recombination fraction between the 
adjacent genes. 
c. Calculate the interference value for these data. 


The boss in your laboratory has just heard of a proposal 
by another laboratory that genes for eye color and the 
length of body bristles may be linked in Drosophila. 
Your lab has numerous pure-breeding stocks of 
Drosophila that could be used to verify or refute ge- 
netic linkage. In Drosophila, red eyes (c*) are dominant 
to brown eyes (c), and long bristles (d*) are dominant 
to short bristles (d). Your lab boss asks you to design 

an experiment to test the genetic linkage of eye color 

and bristle-length genes, and to begin by crossing a 

pure-breeding line homozygous for red eyes and short 

bristles to a pure-breeding line that has brown eyes and 
long bristles. 

a. Give the genotypes of the pure-breeding parental flies, 
and the genotype(s) and phenotype(s) of the F; progeny 
they produce. 

b. In your experimental design, what is the genotype and 
phenotype of the line you propose to cross to the F} to 
obtain the most useful information about genetic link- 
age between the eye color and bristle-length genes? 
Explain why you make this choice. 

c. Assume the eye color and bristle-length genes are sepa- 
rated by 28 m.u. What are the approximate frequencies 
of phenotypes expected from the cross you proposed in 
part (b)? 

d. How would the results of the cross differ if the genes are 
not linked? 


In rabbits, chocolate-colored fur (w*) is dominant to white 
fur (w), straight fur (c*) is dominant to curly fur (c), and 
long ear (s*) is dominant to short ear (s). The cross of a 
trihybrid rabbit with straight, chocolate-colored fur and 
long ears to a rabbit that has white, curly fur and short ears 
produces the following results: 


Phenotype Number 
White, short, straight 13 
“Chocolate, long, straight 165 
Chocolate, long, curly 13 
White, long, straight 82 
"Chocolate, short, straight 436 
Chocolate, short, curly 79 
White, short, curly 162 
“White, long, curly 450 
1400 


a. Determine the order of the genes on the chromo- 
some, and identify the alleles that are present on 
each of the homologous chromosomes in the trihy- 
brid rabbits. 

b. Calculate the recombination frequencies between each 
of the adjacent pairs of genes. 

c. Determine the interference value for this cross. 


The following progeny are obtained from a test cross of 
a trihybrid wild-type plant to a plant with the recessive 
phenotypes compound leaves (c), intercalary leaflets (i), 
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and green fruits (g). (Traits not listed are wild type.) The 
test-cross progeny are as follows: 


Phenotype Number 
Compound leaves 324 
“Compound leaves, intercalary leaflets 32 
“Compound leaves, green fruits 5 
l Compound leaves, intercalary leaflets, green fruits 51 
l Intercalary leaflets 3 
Intercalary leaflets, green fruits 309 
Green fruits 42 
Wild type 49 
815 


27. 


28. 


a. Determine the order of the three genes, and construct 
a genetic map that identifies the correct order and the 
alleles carried on each chromosome in the trihybrid pa- 
rental plant. 

b. Calculate the frequency of recombination between the 
adjacent genes in the map. 

c. How many double-crossover progeny are expected 
among the test-cross progeny? Calculate the interfer- 
ence for this cross. 


In tomatoes, the allele T for tall plant height is dominant 
to dwarf allele t, the P allele for smooth skin is dominant to 
the p allele for peach fuzz skin, and the allele R for round 
fruit is dominant to the recessive r allele for oblong fruit. 
The genes controlling these traits are linked on chromo- 
some 1 in the tomato genome, and the genes are arranged 
in the order and with the recombination frequencies 
shown. 


Gene T P R 


Recombination 0.04 0.18 


frequency 


a. A pure-breeding tall, peach fuzz, round plant is crossed 
to a pure-breeding plant that is dwarf, smooth, oblong. 
What are the gamete genotypes produced by each of 
these plants? 

b. What are the genotype and phenotype of the F, progeny 
of this cross? 

c. What are the genotypes of gametes produced by 
the F4, and what is the predicted frequency of each 
gamete? 

d. The F; are test-crossed to dwarf, peach fuzz, ob- 
long plants, and 1000 test-cross progeny are 
produced. What are the phenotypes of test-cross 
progeny, and what number of progeny is expected in 
each class? 


Neurofibromatosis 1 (NF1) is an autosomal dominant 
disorder inherited on human chromosome 17. Part of the 
analysis mapping the NF1 gene to chromosome 17 came 
from genetic linkage studies testing segregation of NF1 and 


29. 


DNA genetic markers on various chromosomes. A DNA 
marker with two alleles, designated J and 2, is linked to 
NF1. The pedigree below shows segregation of NFI 
(darkened symbols) and gives genotypes for the DNA 
marker for each family member. 


[2 2,2 
4 5 


N 
w 
fon) 
N 
lee) 


“we 


12 D2 22 T2 %2 -282 T2 2&2 


a. Determine the alleles for the NF1 gene and the DNA 
marker gene on each chromosome carried by the four 
family members in generation I and generation II. Use 
N for the dominant NFI allele and 7 for the recessive 
allele and assume I-1 is heterozygous for the disease 
allele (Nn). 

b. Based on the phase of alleles on chromosomes in gener- 
ation II, is there any evidence of recombination among 
the eight offspring in generation III? Explain. 

c. What is the estimated recombination frequency be- 
tween the NF1 gene and the DNA marker? 


A 2006 genetic study of a large American family (Ikeda et 
al., 2006) identified genetic linkage between DNA markers 
on chromosome 11 and the gene producing the autosomal 
dominant neuromuscular disorder spinocerebellar ataxia 
type 5 (SCAS). The following lod score data are taken from 
the 2006 study: 


Theta (6) Value 


0.01 0.05 0.10 0.20 0.30 0.40 
SCA5 and DNA 
marker A 11.02 12.26 11.94 10.04 7.26 3.77 
SCA5 and DNA 
marker B 0.35 094 1.07 0.99 0.75 0.43 


30. 


a. Does either group of lod scores indicate statistically 
significant odds in favor of genetic linkage? Explain 
your answer. 

b. What is the maximum value for each set of lod scores? 

c. Based on the available information, is DNA marker 
A linked to the gene producing SCA5? Explain your 
answer. 

d. Based on available information, is DNA marker B 
linked to the gene for SCA5? Explain your answer. 


A Drosophila experiment examining potential genetic 
linkage of X-linked genes studies a recessive eye mutant 
(echinus), a recessive wing-vein mutation (crossveinless), 
and a recessive bristle mutation (scute). The wild-type 
phenotypes are dominant. Trihybrid wild-type females (all 
have the same genotype) are crossed to hemizygous males 
displaying the three recessive phenotypes. Among the 


31. 


20,765 progeny produced from these crosses are the phe- 
notypes and numbers listed in the table. Any phenotype 
not given is wild type. 


Phenotype Number 
1. Echinus 8576 
| 2. Scute 977 
3. Crossveinless 716 
4, Echinus, scute o J 681 a 
5. Scute, crossveinless 8808 
| G Scute, crossveinless, echinus 7 a 
i 7 Echinus, crossveinless F 1002 
8. Wild type 1 
a 20,765 


a. Determine the gene order and identify the alleles on the 
homologous X chromosomes in the trihybrid females. 

b. Calculate the recombination frequencies between each 
of the gene pairs. 

c. Compare the recombination frequencies and speculate 
about the source of any apparent discrepancies in the 
recombination data. 

d. Use chi-square analysis to demonstrate that the data 
in this experiment are not the result of independent 
assortment. 


As part of their analysis of intragenic recombination, 
Melvin Green and Kathleen Green studied lozenge-eyed 
females with the mutation /z* on one X chromosome and 
the mutation /z$ on the homologous X chromosome. The 
Iz8-bearing X chromosome also carried recessive muta- 
tions for cut wing (ct) and vermilion-colored eye (v). These 
females were mated to cut wing males that had vermilion- 
colored, lozenge-shaped eyes. The chromosomes of these 
flies are depicted in the following drawing. 


+iz® + + ct + Iz v 


= 


ct + Iz v 


a. Diagram the recombination event within the /z gene 
and draw the resulting recombinant X chromosomes, 
illustrating the /z alleles and the flanking markers on 
each chromosome. 

b. What are the phenotypes of progeny male flies 
carrying /z intragenic recombinants? (A double-mutant 
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X chromosome carrying both /z and Iz produces a 
compound lozenge eye that has a different appearance 
than either the lz- or /z*°-derived eye.) 


32. In experiments published in 1918 that sought to verify 


and expand the genetic linkage and recombination theory 
proposed by Morgan, Thomas Bregger studied potential 
genetic linkage in corn (Zea mays) for genes controlling 
kernel color (colored is dominant to colorless) and starch 
content (starchy is dominant to waxy). Bregger performed 
two crosses. In Cross 1, pure-breeding colored, starchy- 
kernel plants (C1 Wx/C1 Wx) were crossed to plants pure- 
breeding for colorless, waxy kernels (c1 wx/c1 wx). The F; 
of this cross were test-crossed to colorless, waxy plants. 
The test-cross progeny are as follows: 


Phenotype Number 
Colored, waxy 310 
Colored, starchy 858 
“Colorless, waxy 78 1 
Colorless, starchy Siti] 
l 2260 


In Cross 2, plants pure-breeding for colored, waxy kernels 
(C1 wx/C1 wx) and colorless, starchy kernels (cl Wx/cl Wx) 
were mated, and their F} were test-crossed to colorless, 
waxy plants. The test-cross progeny are as follows: 


Phenotype Number 
Colored, waxy 340 
Colored, starchy 115 
“Colorless, waxy l 92 
Colorless, starchy 298 
ft. a 845 


a. For each set of test-cross progeny, determine whether 
genetic linkage or independent assortment is more 
strongly supported by the data. Explain the rationale for 
your answer. 

b. Calculate the recombination frequency for each of the 
progeny groups. 

c. Are the results of these two experiments mutually com- 
patible with the hypothesis of genetic linkage? Explain 
why or why not. 

d. Merge the two sets of progeny data and determine the 
combined recombination frequency. 
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Bacteriophages 


Bacteria transfer DNA to one another by multiple mechanisms, including 
the process of gene transfer called conjugation, shown here. The bacte- 
rial “donor” (center left) transfers DNA through a tube that connects it to a 
bacterial “recipient” (lower right). 


H ere’s a disturbing little secret of human life: Your body 
contains approximately 100 trillion cells, but only about 
10 trillion of them are yours! The other 90% of the cells you 
carry around are bacteria, fungi, and other forms of micro- 

scopic life. Many of these biological hitchhikers perform 
useful, even essential, functions. For example, you carry hun- 
dreds of species of bacteria in your gut that collectively have 
amass of more than 3 pounds. Without these intestinal bac- 
teria, your digestion of carbohydrates would be impaired, and 
your ability to manufacture essential nutrients such as vitamin 
B12 and vitamin K would be disabled. The bacteria teeming in 
your digestive tract also help keep potentially harmful bacteria 


at bay by vigorously competing for available nutri- 
ents. Similarly, the millions of bacteria that currently 
reside on your skin (yes, even though you showered 
recently!) help keep your skin healthy by compet- 
ing with infectious bacteria. Despite this normal and 
healthy competition, harmful bacteria can gain ac- 
cess to our bodies. Occasionally even our normally 
helpful microbial passengers turn against us and 
cause illness, infection, or, in extreme cases, death. 

Given the biological, medical, and technological 
importance of bacteria and other microorganisms, it 
is no wonder they are studied intensively in modern 
genetics, using the bacterium Escherichia coli and 
yeast Saccharomyces cerevisiae as model genetic 
organisms. The relative ease of studying microor- 
ganisms fueled revolutionary change in genetics in 
the latter half of the 20th century. Much of the initial 
information in molecular genetics and many of the 
methods of genetic analysis pioneered in the study 
of bacteria have proven valuable in the study of 
more complex organisms. 

In this chapter, our focus is on investigating and 
understanding how genetic analysis is applied to 
the study of gene transfer and mapping in bacterial 
and bacteriophage genomes. We take a historical 
genetic approach in our discussion, focusing on the 
applications of genetic analysis that were used to 
map genes in bacterial genomes in the decades be- 
fore genome sequencing was developed. Genome 
sequences of thousands of bacterial species are now 
published, and their analysis verifies the accuracy 
and validity of the conclusions reached through use 
of the approaches we describe in this chapter. 

We begin by looking at three mechanisms by 
which DNA can be transferred from one bacterium 
to another. After showing how analysis of these pro- 
cesses helps microbial geneticists locate the positions 
of genes on the bacterial chromosome, the chapter 
turns to a discussion of bacteriophages, the viruses 
that infect bacterial cells. It describes experiments 
that led to a fine-structure map of a bacteriophage 
genome and provided an essential bridge between 
transmission genetics and modern molecular genetics. 
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6.1 Bacteria Transfer Genes 
by Conjugation 


Bacteria propagate by binary fission, a process in which the 
bacterial chromosome replicates, and a copy is distributed 
to each of the progeny cells along with a share of the con- 
tents of the dividing cell. In a matter of hours, this form 
of clonal propagation can generate a “colony” containing 
thousands of genetically identical bacteria cells. The ability 
of bacteria to produce colonies of clones, however, does not 
mean that bacteria never recombine genetically. A series of 
studies in the 1940s and 1950s identified and described 
the three mechanisms of gene transfer and recombination 
between bacteria that are a focus of this chapter. 

Bacteria are a highly diverse taxonomic group, and 
they are essential for genetic study. Among the features 
that make bacteria so useful to geneticists are the following: 


Genomic simplicity. Most bacterial genomes con- 
tain fewer genes and fewer base pairs in their haploid 
genomes than do other organisms. 


Uncomplicated genotypes. The haploid genomes 
of most bacteria allow all mutations to be observed 
directly, without interference from dominance 
interactions between alleles. 


Short generation times. Bacteria reproduce rapidly; 
their generation times can be measured in minutes. 


Large numbers of progeny. Enormous numbers of 
clonal progeny can be examined, increasing the likeli- 
hood that statistically rare events will be observed. 


Ease of propagation. Microbes may be grown either 
in liquid culture or on culture plates. The cultures are 
easy and inexpensive to maintain, and they require 
little laboratory space. 


Numerous heritable differences. Mutants are easily 
created, identified, isolated, and manipulated for 
examination. 


A central characteristic of interest in this chapter 
is the propensity of bacteria to transfer genetic mate- 
rial from one individual bacterium to another. Transfer 
occurs by three processes: conjugation, the transfer of 
replicated DNA from a donor bacterium to a recipient 
bacterium; transformation, the uptake of DNA from the 
environment by a recipient bacterium; and transduc- 
tion, the transfer of DNA from a donor bacterium to a 
recipient bacterium by way of a viral vector. Each of these 
mechanisms involves a one-way transfer of genetic ma- 
terial from a bacterial donor cell to a recipient cell. The 
transferred DNA is either an extrachromosomal plasmid 
or a portion of the donor bacterial chromosome. Often, 
the plasmids transferred into recipient cells bring new 
genes that change the growth behavior of recipient cells. 
Alternatively, plasmids may carry a second copy of genes 
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already on the bacterial chromosome. When bacterial 
chromosome DNA from the donor cell is transferred to 
a recipient bacterium, the homologous parts of the donor 
and recipient DNA molecules can undergo recombination 
that leads to a change in the genotype of the recipient cell. 

Regardless of the nature of the DNA transferred from 
donor cells to recipient cells, a key to understanding the 
process is to remember that it is a one-way street: Genetic 
material moves from donor to recipient. 

Each of these processes is an example of lateral gene 
transfer, a nonreproductive process through which bacteria 
and archaea actively exchange genetic material. Lateral gene 
transfer also takes place between bacteria and eukaryotes. 
The impact of these events on genomes and on the evolu- 
tion of life are topics for later discussion in this chapter. 


Characteristics of Bacterial Genomes 


Bacterial genomes are usually composed of a single 
chromosome that carries primarily essential genes—those 
necessary for the species’ metabolic and growth activi- 
ties. The bacterial chromosome is usually a covalently 
closed, circular molecule of double-stranded DNA. In 
keeping with the small size of the genome—from a few 
hundred thousand to several million base pairs—the bac- 
terial chromosome, too, is usually quite small, likewise 
varying from a few hundred thousand to several million 
base pairs. 

In addition to the main bacterial chromosome, most 
bacteria also carry multiple copies of plasmids, small 
double-stranded circular DNA molecules containing 
nonessential genes that are used infrequently or un- 
der specialized conditions not ordinarily encountered 
by the species (Figure 6.1). Plasmids vary widely in their 


Ruptured 
E. coli cell 


Figure 6.1 Bacterial chromosome and plasmids. A ruptured 
E. coli cell has released its chromosomal DNA along with multiple 
plasmids (red). 


number of genes and their total number of base pairs, 
but they are always considerably smaller than bacterial 
chromosomes. Plasmids are described as extrachromo- 
somal DNA, meaning they are generally separate from the 
bacterial chromosome, although we will encounter some 
exceptions as the chapter proceeds. 

Many different kinds of naturally occurring plas- 
mids are found in bacteria, and each contains several 
genes. One plasmid we are about to discuss, called 
an F (fertility) plasmid, contains genes that promote 
its own transfer from a donor bacterium to a recipi- 
ent. Another type of plasmid we discuss, known as an 
R (resistance) plasmid, carries antibiotic resistance 
genes that can be transferred from donors to recipients. 
Plasmids are easily modified in the laboratory to produce 
specific characteristics or to carry particular genes that 
are useful in a wide range of recombinant DNA applica- 
tions (see Chapters 16 and 17). For purposes of most of 
our discussion in this chapter, we will only consider an- 
tibiotic resistance genes that are carried on an R plasmid. 
Consequently, a strain that is resistant to an antibiotic 
carries an R plasmid with the gene, and an antibiotic 
susceptible strain does not carry an R plasmid. This ap- 
proach simplifies our discussion and understanding of 
experimental results, but in reality, numerous bacterial 
strains carry antibiotic resistance genes on the bacterial 
chromosome. The transfer of both plasmid-borne and 
chromosome-borne antibiotic resistance genes among 
bacterial strains is a major contributing factor to the rapid 
spread of antibiotic resistant strains of infectious bacteria. 

Plasmids generally replicate autonomously. Conse- 
quently, up to several dozen copies of a plasmid can be 
found in a single bacterial cell. Such plasmids are identified 
as “high-copy-number” plasmids. Alternatively, low-copy- 
number plasmids are generally unable to replicate on their 
own because their replication is tied to that of the bacterial 
chromosome. These plasmids are present in 1 or 2 copies 
per bacterial cell. As you will soon see, high-copy-number 
plasmids play a pivotal role in conjugation and in the analy- 
sis of bacterial gene transfer and gene mapping. 

A key to identifying the genotypes of bacterial strains 
is to assess their growth on media having different con- 
stituents. This is a procedure that is easy to master 
by understanding a few principles of microbial growth. 
Research Technique 6.1 introduces you to the interpreta- 
tion of microbial-growth results and the identification of 
microbial genotype. 


Conjugation Identified 


Bacterial DNA transfer was first identified by Joshua 
Lederberg and Edward Tatum in 1946. They used two 
triple-auxotrophic strains of E. coli that had different 
nutritional requirements for growth (see Experimental 
Insight 4.1, pages 125—126, for a review of prototrophy 
and auxotrophy). The researchers first established three 
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Research Technique 6.1 


Genotyping Using Microbial Growth 


The results of experiments on microbes described in this chap- 
ter have shaped our understanding of how genes work, in- 
cluding how they are organized and how they are expressed. 
A basic set of common laboratory techniques and analyses as- 
sessing growth or failure to grow in liquid or semisolid media 
made up of different components can be used to determine 
the genetic makeup of microorganisms. Proper interpretation 
of the genotype of a microbe based on its pattern of growth 
on different media is an essential skill of genetic analysis that is 
easy to master once you understand a few key concepts. 


ANABOLIC AND CATABOLIC PATHWAYS Compounds that 
influence the growth of microbes on growth media fall into 
two broad categories. In the first are compounds synthesized 
by prototrophic (wild-type) microbes in biosynthetic pathways 
that are often described as anabolic pathways. In anabolic path- 
ways, energy is used to synthesize complex compounds from 
simpler ones through sequential reaction steps. Figure 4.17 and 
the accompanying discussion of the anabolic pathway that 
synthesizes the amino acid methionine (pages 121—123) pro- 
vide an example. In contrast, catabolic pathways are pathways 
through which energy is produced by the breakdown of complex 
compounds into simpler ones. Catabolic pathways also fol- 
low sequential steps. Our discussion of phenylketonuria (PKU) 
(pages 121—123) highlights the catabolic pathway that breaks 
down the amino acid phenylalanine. Similarly, compounds such 
as polysaccharide sugars like lactose and other carbohydrates 
are broken down in catabolic pathways. 


VISUALIZING MICROBIAL GROWTH When microbial 
growth occurs on a semisolid growth plate in a petri dish, indi- 
vidual colonies may appear on the plate. Each colony is actually 
hundreds of thousands to millions of individual microbes that 
are all descendant from a single microbial cell among those 
originally spread on the plate in a very dilute solution. Depend- 
ing on microbe genotypes and the composition of the growth 
medium, it is possible that more than one microbial genotype 
is growing on a particular plate, but what is certain is that the 
cells in each colony are genetically identical. In a liquid growth 
medium, microbial growth produces cloudiness—the result of 
there being so many living cells in the growth vessel that the 
passage of light through the medium is impeded by the cells. 
There are no colonies in liquid media. 

Identifying the genotype of a microbe often requires as- 
sessing the growth of a particular colony on different growth 
media. This is accomplished by replica plating. One method 
of replica plating is to simply touch a colony growing on one 
growth medium with a sterile toothpick or a similar instru- 
ment to gather some cells of the colony and then touch a spot 
on a different growth plate. Systematic use of a grid pattern 
on the new plate and care in the recording of growth results 
permit comparison of growth results on different plates so 
as to identify colony genotypes. An alternative replica plat- 
ing method involves transferring all the colonies growing on 
one plate to a new growth plate all at once. A round wooden 


or plastic block slightly smaller in diameter than a petri dish 
and covered with a piece of sterilized velvet is used for this. 
The velvet-covered block is gently pressed onto the colonies 
of one plate to pick up some cells from each colony and then 
is used to stamp one or more fresh growth-medium plates. 
Growth results can be compared between plates, and geno- 
types of colonies can be identified because all the colonies 
are in the same relative positions on both the original and the 
new plate. 


ALLELIC IDENTIFICATION Distinguishing between com- 
pounds produced by anabolic pathways and those broken 
down in catabolic pathways is a critical aspect of interpreting 
microbial growth and identifying microbial genotype that 
requires knowledge of growth media and their constituents. 
As defined in Experimental Insight 4.1, a minimal medium 
contains glucose as the carbon source, since glycolysis is the 
fundamental energy-producing reaction in many organisms, 
including humans and many microbes. The minimal medium 
also contains nitrogen, some inorganic salts, and water. In or- 
der to grow on minimal medium, a microbe must synthesize 
every compound it needs for metabolism, DNA replication, 
transcription, and translation. The compounds required to 
carry out these essential functions are the products of ana- 
bolic pathways. Only prototrophs (wild-types) can synthesize 
all the products required for growth on a minimal medium. 
The ability to synthesize an essential compound by comple- 
tion of an anabolic pathway is indicated in genetic notation 
by a “+” (plus) symbol and identifies a wild-type allele; thus, 
a microbe capable of biosynthesizing the amino acid methio- 
nine is identified as met” (spoken “met plus”). In contrast, the 
“—" (minus) symbol indicates the organism in an auxotroph 
(mutant) that is unable to synthesize a particular compound 
due to mutation. The control prototroph shown in Figure 4.19 
(p. 127) is met", whereas the four other strains are each met. 
Auxotrophs can also grow on supplemented minimal medium, 
which is a minimal medium supplemented with just the spe- 
cific compound or compounds an auxotroph is unable to pro- 
duce on its own. 

In the case of catabolic pathways—allelic symbols identify the 
ability of a strain to complete a catabolic pathway with a super- 
script “+” and the inability to complete a catabolic pathway with 
the “—" symbol. For example, microbes that are able to grow ona 
medium that contains the milk sugar lactose instead of glucose 
are lac". The ability to grow on lactose requires production of the 
enzymes that breakdown lactose into simpler compounds. In 
contrast, microbes that are unable to grow on lactose-containing 
media are lac . These strains are unable to produce one or more 
of the enzymes required for lactose metabolism. 

The accompanying figure guides you through the identi- 
fication of prototrophs and auxotrophs among 10 microbial 
colonies for the amino acids alanine (ala) and proline (pro) and 
for the ability of the colonies to break down lactose. Genotype 
identification is accomplished by comparing growth on plates 
of media containing different constituents. The accompany- 
ing table summarizes the genotype of each colony and the 
reasoning used to identify the genotype. 


(continued) 


190 CHAPTER 6 Genetic Analysis and Mapping in Bacteria and Bacteriophages 


Research Technique 6.1 Continued 


(a) 


2 I. Compare complete and minimal 
e5 , 
3 pe medium plates. 
6 g 9 10 i, 
E Conclusion: colonies 1,4, 5, 7, 9, and 10 
Replica plate are prototrophs, and colonies 2, 3, 6, and 
Complete medium ——————>. Minimal medium 8 are autotrophs. 
| Replica plate 
(b) 
2 
3 3 
6 6 
Minimal plus alanine (ala) Minimal plus proline (pro) Minimal plus alanine and proline 


Compare to minimal medium plate. 
Conclusion: colony 3 is ala”. 


Compare to minimal medium plate. 
Conclusion: colony 6 is pro~. 


Compare to minimal medium plate. 
Conclusion: colony 2 is ala, pro-. 


Comparing the results of the three supplemented minimal media to minimal medium identifies 
colony 8 as an autotroph with an unknown genotype. 


(c) 


Replica plate from 


complete medium | 
Compare to minimal | Compare to minimal 


medium plate. medium plus alanine and proline 


Conclusion: 1,5, 7, and 9 are lact 2 plate. Colony 2 is ala, pro, lact; 

and colonies 4 and 10 are lac. colony 3 is ala”, lac; colony 6 is 

Auxotrophic colonies 2, 3, 6, 6 pro”, lact 

and 8 do not grow without 

supplementation. 
Lacatose medium Lactose plus alanine and proline 
Comparing the results of the lactose-containing media to previous results identifies the prototrophic 
colonies 4 and 10 to be lac~ 

Colony Genotype Explanation 
1,5,7,and 9 ala‘ prot lacs These are prototrophs. Grow on minimal medium and on lactose medium. 
2 ala” pro” lact Auxotroph. Does not grow on minimal medium. 


Grows on minimal medium supplemented with both alanine and proline. Also 
grows on lactose medium supplemented with alanine and proline. 
3 ala pro*lac™ Auxotroph. Does not grow on minimal medium. 


Grows on minimal medium supplemented with alanine. Does not grow on lactose 
medium supplemented with alanine and proline. 


4and 10 ala* pro* lac Prototroph. Grows on minimal medium. Does not grow on lactose medium. 


6 ala* pro” lact Auxotroph. Does not grow on minimal medium. 
Grows on minimal medium plus proline and grows on lactose medium plus 
alanine and proline. 


8 Unknown genotype Auxotroph. Does not grow on minimal medium. 


Culture (1) Culture (2) 


= 


Culture 
Y-24 © 
met’ bio` leu* cys’ phe thr* thi* 


Y-10 
met* bio* leu" cys* phe* thr thi 


| Grow in complete medium. | Grow in complete medium. | 


o 


Transfer to 
minimal medium. 


| Transfer to 


minimal medium. 


Y-24 and Y-10 
| Grow in complete medium. | 


Transfer to 
minimal medium. 


4 


No growth Colony growth No growth 


All cells are Prototrophic cells grow All cells are 
auxotrophic. B bio* leu* a auxotrophic. 
phet thr* thi* 


Figure 6.2 Lederberg and Tatum’s detection of 
recombination between auxotrophic E. coli cells. 
Auxotrophic bacterial strains @ (Y-24) and @ (Y-10) each 
contain multiple mutations and grow on complete medium, 
but not on minimal medium. © Mixing the strains leads to 
the formation of prototrophic bacteria that grow on minimal 
medium. 


separate bacterial cultures growing, initially, in a complete 
medium (Figure 6.2). In culture @, they grew an auxotro- 
phic strain called Y-24, which has the genotype bio” leu* 
cys phe” thr* thi*. Because of its genotype, the Y-24 strain 
requires addition of the vitamin biotin (bio) and the amino 
acids cysteine (cys) and phenylalanine (phe) to a minimal 
medium for growth. In culture @, they placed an auxo- 
trophic strain called Y-10, which has the genotype bio* 
leu cys* phe* thr” thi. The Y-10 strain requires addition 
of the vitamin thiamine (thi) and the amino acids leucine 
(leu) and threonine (thr) for growth. Culture @ contained 
an equal mixture of both Y-10 and Y-24. 

Each culture was allowed to grow. Then approxi- 
mately 10° cells from each culture were plated onto dishes 
of minimal medium, where a prototrophic (wild-type) 
genotype is required for growth. Lederberg and Tatum 
saw no growth on Plates 1 and 2, which contained cells 
transferred from culture @ and culture @, respectively. 
These results were consistent with the nutritional re- 
quirements of Y-24 and Y-10, and indicated that all the 
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cells transferred to those plates were auxotrophs. Plate 3, 
however, developed about 100 growing colonies! These 
colonies grew from bacterial cells that had somehow 
acquired the prototrophic genotype (met bio* leu* cys* 
phet thr* thi”). 

Lederberg and Tatum were certain that this 
outcome did not result from the reversion (reverse 
mutation) of auxotrophs to prototrophs (reversion is 
mutation that produces a wild-type allele from a mutant 
allele). First, the odds of that many genes reverting at 
once are prohibitively small. Second, plates 1 and 2 served 
as “negative control” plates. If reversion were respon- 
sible, these plates would show colony growth. Instead 
of reversion, the researchers claimed there had been a 
transfer of genetic information. More specifically, they 
proposed that one auxotrophic strain was transferring 
some of its prototrophic alleles to the other auxotrophic 
strain when the two strains were mixed, and that the 
second strain was replacing its auxotrophic alleles by 
incorporating the prototrophic information from the 
first strain. 

Lederberg and Tatum hypothesized that physical 
contact between bacteria was necessary for gene trans- 
fer, but their original experiment did not provide direct 
evidence that this might be so. Four years later, Bernard 
Davis replicated the work and showed the necessity of 
contact between bacterial cells for gene transfer to take 
place. For his experiment, Davis constructed a U-tube 
with a fine glass filter separating one arm from the other 
(Figure 6.3). The filter was a glass disk with very small 
pores that allowed passage of small molecules such as 
nutrients but not bacterial cells. A cotton ball plugging 
one end of the U-tube and a rubber stopper connected to 
an air line at the other allowed Davis to move the material 
in the tube by alternating suction and pressure. The tube 
contained a culture of E. coli strain Y-10 on one side of the 
glass disk and a culture of strain 58-161, auxotrophic for 
methionine synthesis (met), on the other side of the disk, 
and the glass disk prevented direct contact between the 
two bacterial strains. 

Based on Lederberg and Tatum’s experiments, Davis 
hypothesized that direct contact between the auxotrophic 
strains was needed to produce prototrophic bacteria. After 
alternating suction and pressure for several hours, Davis 
plated bacterial samples from each side of the U-tube onto 
minimal medium and found no growth from either side 
of the U-tube. This lack of growth was an indication that 
cells on either side of the disk retained their auxotrophy. 
Davis concluded that physical contact between bacterial 
cells is required for gene transfer to take place. 

Microscopic studies have confirmed the physical 
union between bacteria hypothesized by Lederberg and 
Tatum and supported by Davis. This process of gene 
transfer is called conjugation. One of the participating 
bacteria, known as a donor cell, transfers some of its 
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Figure 6.3 Davis's U-tube experiment, showing that 
genetic recombination requires cell-to-cell contact. 
Auxotrophic bacterial strains Y-10 and 58-161 are unable to 
grow on minimal medium, but produce some prototrophs that 
grow on minimal medium when they make contact following 
mixing. Prototrophs are not produced when the auxotrophs are 
placed in a U-tube, indicating that direct contact is required to 
generate prototrophic bacteria. 


genetic information to the other cell, known as a recipi- 
ent cell. The genetic information is conveyed by way of 
a hollow tube known as a conjugation pilus or conjuga- 
tion tube that physically connects donor and recipient. 
Conjugation is pictured in the chapter-opening photo 
on page 186. In the photo, the conjugation pilus is the 
thread-like structure in the middle connecting the donor 
and recipient bacterial cells. 


Transfer of the F Factor 


In 1953, William Hayes discovered that the bacteria in- 
teracting in Lederberg and Tatum’s and in Davis’s experi- 
ments did not contribute equally to the genetic outcome, 
as they do in a genetic cross between eukaryotes. Instead, 
the process was unequal, leading Hayes to conclude that 
a one-way transfer of genetic information takes place be- 
tween donors and recipients. 

Hayes further proposed that the ability to act as a 
donor was hereditary and was determined by a “fertil- 
ity factor” (F factor) that was transferable from donors 
to recipients. Donors are designated as F*(F*cells) to 
indicate their possession of an F factor, and recipients 
are identified as F (F cells) and lack the F factor. 
An F is also known as a recipient cell. In the years 
after Hayes proposed the existence of the F factor, 
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microbiologists identified the F factor as the F plasmid 
(fertility plasmid). 

Microbiologists today know that conjugation is con- 
trolled by genes carried on the F plasmid. As a conse- 
quence, only donor cells initiate conjugation. Recipient 
cells (F` cells) are unable to initiate conjugation. 
Conjugation occurs between a donor cell and a recipient, 
but not between two donor cells. F factor genes direct the 
construction of hair-like pili (the plural of pilus) that have 
sensory functions. One pilus becomes specialized to serve 
as the conjugation pilus that connects donor and recipi- 
ent, forming the conduit across which DNA from the do- 
nor cell is transferred (see the chapter-opening photo). 
Ultimately, three kinds of cells are seen in conjugation: a 
donor cell that contains an F plasmid and donates genetic 
information, a recipient cell that receives DNA from a 
donor cell but does not contain a functional F factor, and 
the exconjugant cell that is produced by conjugation. 
An exconjugant cell is essentially a recipient cell that has 
had its genetic content modified by receiving DNA from 
a donor cell. 

The F factor is some 100 kb in length, and about 
35% of its sequence is devoted to about 40 genes that 
control conjugation (Figure 6.4). The F plasmid genes 
that play a role in E. coli conjugation are given four-letter 
designations consisting of the prefix tra or trb followed 
by a capital letter. Much of the remainder of the F fac- 
tor consists of four insertion sequence (IS) elements: 
one copy of IS2, two copies of IS3, and one copy of the 
very large IS1000. Insertion sequence (IS) elements are 
mobile segments of bacterial DNA that are capable of 
transposing themselves throughout the bacterial genome 
and have an important functional role in bacterial gene 
transfer Section 13.6. 
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Figure 6.4 F plasmid structure. (a) Several genes important 
in F factor transfer are shown along with the origin of transfer 
(oriT) and several insertion sequence (IS) locations. (b) The 
38-bp sequence of oriT, including the cleavage site. 


Conjugation between an F* donor and an F” recipi- 
ent transfers a copy of the F factor and produces excon- 
jugants that are F* donors, as illustrated in Figure 6.5. 
Conjugation begins with contact between the F* and the 
F` cell, initiated by the formation of a conjugation pilus. 
Conjugation pili are composed of pilin protein, produced 
by the traA gene on the F factor (see Figure 6.4). Circular 
DNA elements like the F factor that can replicate inde- 
pendently of the bacterial chromosome or, as we discuss 
in the following section, can integrate into the bacterial 
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chromosome and replicate as part of the chromosome, 
are also termed episomes. 

Shortly after contact is established by the conjugation 
pilus, gene expression from the F factor produces a pro- 
tein complex called the relaxosome. This protein complex 
binds to a specialized F factor sequence called the origin 
of transfer (oriT). At oriT, the relaxosome catalyzes 
cleavage of one phosphodiester bond on one DNA strand, 
called the T strand, to signify that this is the strand trans- 
ferred to the recipient cell. DNA cleavage at oriT defines 


The donor cell (F*) assembles a conjugation 
pilus to contact the recipient cell (F`). 


The relaxosome complex binds the F factor 
at oriT and cleaves the T strand of the DNA. 


The relaxosome partially degrades, leaving 
relaxase bound at the 5’end of the T strand. 
The relaxase-T strand complex binds to a 
coupling factor to prepare for export. Rolling 
circle DNA replication begins in the donor. 


The exporter moves the relaxase-T strand 
complex into the recipient cell. Rolling circle 
replication in the donor spools the T strand 
to the recipient, where it is a template for 
DNA replication. 


The completion of replication in both cells 
leaves the donor (F*) unchanged and 
converts the recipient cell to an F* donor 
state. 


Figure 6.5 Conjugation of F* and F™ cells. Rolling circle replication transfers a single strand of 
the F factor, beginning at oriT, from a donor cell to a recipient cell, where it is replicated to convert the 


recipient cell (F7) to an F* donor. 
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a 3’ end and a 5’ end on the T strand and initiates some 
unwinding of the DNA duplex in the vicinity of oriT. 

T strand unwinding releases most of the compo- 
nents of the relaxosome, but one protein, called relaxase, 
the product of the tral gene, binds to the free 5’ end of 
the T strand DNA to form a nucleoprotein complex. The 
nucleoprotein complex at the 5’ end of the T strand pro- 
vides a critical recognition signal for a protein called the 
coupling protein, the product of the traD gene, which 
takes a position near the entry of the conjugation pilus. 
The nucleoprotein complex binds briefly to the coupling 
protein and then affiliates with several proteins of the 
exporter complex that move the nucleoprotein complex 
and the T strand across the conjugation pilus and into the 
recipient cell. 

T strand transfer across the conjugation pilus is ac- 
companied by a specialized process of DNA replication, 
known as rolling circle replication, inside the donor 
cell. In this specialized unidirectional replication process, 
one strand of DNA is spooled off across the conjugation 
pilus while, within the donor, the remaining DNA strand 
serves as the template for unidirectional synthesis of a 
replacement DNA strand. In the recipient, the spooled-off 
DNA strand also acts as a template for DNA synthesis. 
We discuss the molecular details of DNA replication in 
Chapter 7. 

Rolling circle replication begins at oriT, where the 
single-stranded break in DNA exposed the 3’ hydroxyl 
end of the T strand. At this exposed 3’ hydroxyl end, 
DNA polymerase adds new nucleotides, utilizing the 
complementary, intact (unbroken) DNA strand as a tem- 
plate. The new DNA replication taking place during roll- 
ing circle replication eventually displaces the 5’ end of the 
T strand, freeing it to be transferred across the conjuga- 
tion pilus into the recipient cell. 

Completion of rolling circle replication in the donor 
cell restores the donor’s double-stranded F factor, leav- 
ing that cell’s F* donor state intact. Meanwhile, inside the 
recipient cell, the imported T strand acts as a template 
directing the synthesis of a complementary DNA strand. 
At the conclusion of this process, the two ends of oriT 
join to circularize the molecule, completing the creation 
of an F factor in the recipient. With the presence of an F 
factor, the formerly F” recipient cell is converted to an F* 
donor cell. 

Table 6.1 identifies two pivotal outcomes of F* X F` 
conjugation. First, complete transfer of the F factor con- 
verts the F” recipient cell to an F* donor cell. Second, 
no donor bacterial chromosomal genes are transferred 
during this conjugation process. Only the F factor DNA 
is transferred to an F” recipient cell by an F* donor cell. 
You will recall that Lederberg and Tatum provided clear 
evidence of chromosomal gene transfer from one bacte- 
rial strain to another, and Davis showed that conjugation 
was required for the transfer to occur. However, F* X F~ 
conjugation is not responsible for the observations of 


Table 6.1 Outcomes of Bacterial Conjugation 
Conjugation Outcome 
Exconjugant Donor Bacterial 
Converted to Genes Transferred 
Donor State? to Exconjugant? 
FX FO Yes, F — F* No 
Hfr < E No Yes 
RY xe lee Yes, F —> F’ Yes 


Lederberg and Tatum; the logical conclusion is that 
there must be some other type of conjugation, involv- 
ing different kinds of bacterial donor cells, to transfer 
bacterial chromosomal genes from a donor cell to a 
recipient cell. 


Formation of an Hfr Chromosome 


Contact between the donor and the recipient bacteria is 
required for gene transfer, but the Lederberg and Tatum 
results cannot be explained by conjugation involving an 
F* donor because in F* xX F` conjugation, only genes on 
the F plasmid are transferred. 

An experiment in 1953 by Luigi Luca Cavalli-Sforza 
provided critical new insight when it was found that 
a previously unknown form of donor bacteria was re- 
sponsible for the gene transfers observed by Lederberg, 
Tatum, and Davis. Working with mutagenized donor 
E. coli, Cavalli-Sforza identified donor strains that trans- 
ferred donor bacterial genes to recipient bacteria at an 
extraordinarily high rate. Cavalli-Sforza labeled these 
bacterial strains high-frequency recombination, or Hfr, 
strains to indicate the high rate at which Hfr donor 
genes recombined with the chromosome of F` recipi- 
ents. Cavalli-Sforza also determined that conjugation 
involving Hfr donors and F` recipients virtually never 
converted the recipients to F* or Hfr donors. 

Microscopic examination of Cavalli-Sforza’s 
Hfr strain revealed an important difference in the 
configuration of the F factor. Instead of being an extra- 
chromosomal plasmid, the F factor in Hfr strains is in- 
tegrated into the bacterial chromosome, forming an Hfr 
chromosome (Figure 6.6). The formation of Hfr chro- 
mosomes is rare: Only about 1 in every 100,000 F* cells 
converts to an Hfr cell. The integration event takes place 
at IS elements that are shared by F plasmids and bacterial 
chromosomes. 

There are multiple IS elements shared by plasmids 
and bacterial chromosomes; thus, many different Hfr 
chromosomes can potentially form. Once an Hfr chro- 
mosome forms, it is stable and does not change to an 
alternative Hfr form. Two attributes of the F factors in 
Hfr chromosomes distinguish one Hfr from another. 
First, the location of F factor integration varies between 


Bacterial chromosome F factor 


F* cell " 
oriT 


element 


Recombination of bacterial 
chromosome and F factor 
at an IS element 


Hfr cell 


Figure 6.6 Hfr chromosomes. Hfr cells carry an Hfr chromo- 
some that is created when an F factor integrates into an 
insertion sequence (IS) in the bacterial chromosome. 


Hfr strains: It can occur at any of the IS sites present 
on the bacterial chromosome. Second, the integrated F 
factor can have one of two different orientations at each 
integration location. The integration of an F factor to 
form a new Hfr chromosome occurs just once, establish- 
ing an Hfr strain with a site of F factor insertion and an 
orientation of the F factor that are fixed characteristics 
of all bacteria of the resulting Hfr lineage. Both location 
and orientation of the F factor are important to consider 
in mapping bacterial genes in Hfr chromosomes, as we 
discuss in Section 6.2. 


Hfr Gene Transfer 


Hfr bacteria transfer genetic material to recipient cells 
by the same rolling circle replication process seen in 
F* X F` conjugation. As in F* x F` conjugation, the 
relaxosome binds to oriT and cuts the T strand to initi- 
ate unwinding and transfer of the T strand to the recip- 
ient. A portion of the integrated F factor is transferred 
first, followed by the bacterial chromosomes and finally 
by the remainder of the integrated F factor. In theory 
the entire Hfr chromosome could be transferred during 
Hfr X F` conjugation, but in reality this is impossible. 
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The normal movement of bacteria will break the con- 
jugation pilus long before Hfr transfer is completed. 
Thus, only a portion of the F factor sequence is trans- 
ferred from the donor to the recipient, along with a 
portion of the donor bacterial chromosome containing 
genes located near the IS site of insertion. In conjuga- 
tion experiments, the duration of conjugation is vari- 
able in duration. Some conjugation events are very 
short, others quite long, and others of intermediate 
duration. 

The segment of T strand DNA that is successfully 
transferred into the recipient cell is used as template 
DNA to generate a double-stranded linear fragment. At 
whatever point the conjugation pilus ruptures, conjuga- 
tion is interrupted, and T strand transfer and replica- 
tion cease. Figure 6.7 illustrates conjugation between an 
Hfr with the genotype thr* leu” str and an F” with the 
genotype thr leu” str? (the function of str? and str’ is 
explained momentarily). Within the recipient cell, the 
donor DNA is a linear double-stranded DNA fragment 
containing a portion of the F factor and a segment of 
donor bacterial DNA that was adjacent to oriT. Without 
the complete oriT sequence, the linear DNA cannot cir- 
cularize; and since only a portion of the F factor is trans- 
ferred, Hfr donors cannot convert F recipient cells to a 
donor state (see Table 6.1). However, before the linear 
segment of donated donor DNA undergoes enzymatic 
degradation in the recipient cell, it can undergo homolo- 
gous recombination with the recipient chromosome. The 
new exconjugant cell, formerly the recipient cell, may 
thus acquire one or more genes from the donor bacterial 
chromosome. 

Conjugation experiments mix one strain of donor 
bacteria in a culture vessel with a different strain of re- 
cipient bacteria. Exconjugants produced within the ves- 
sel can be identified by their acquisition of donor genes. 
Exconjugants are identified by their genotypes that are 
distinct from those of either the donor strain or the re- 
cipient strain. Exconjugants are identified by their growth 
on a selective growth medium, a medium containing 
compounds that permit only exconjugants with specific 
genotypes to grow and that also prevent the growth of 
donor cells and recipient cells. 

In experiments of this kind, antibiotic sensitivity 
and resistance is used as a tool to control growth of bac- 
teria. In the recipient cells, resistance to the antibiotic 
streptomycin (str?) comes from a gene carried on an 
extrachromosomal R plasmid (see Figure 6.7). The donor 
cell is streptomycin sensitive (str), but this is due to the 
absence of an R plasmid, not to the presence of an allele 
for streptomycin sensitivity. Streptomycin resistance is 
therefore a genotypic attribute of recipient and excon- 
jugant cells but not of donor cells, and the presence of 
streptomycin in the selective growth medium will kill 
donor cells so they do not grow and potentially confuse 
the analysis. 
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Figure 6.7 Hfr conjugation and 
exconjugant detection. An Hfr chro- 
mosome fragment transferred during Hfr donor thr* leu” str F- recipient thr leu’ str" 
interrupted mating between an Hfr p 
donor cell to an F` recipient cell can 
undergo homologous recombina- 

tion with the recipient chromosome. 
Exconjugants are detected on selective 
growth media, such as the minimal 
medium shown here. 


| Mix in conjugation culture. | 


Bacterial 
chromosome 


Conjugation and partial T strand 
transfer due to interrupted mating. 


Crossover sites F factor segment 


Homologous 
recombination 


chromosomal 
fragment 


Enzymatic 


One kind of 
exconjugant cell 
thr’ leu* str" 


[ Only thr* leu* str exconjugants grow. | 


As an example, consider again a conjugation ex- the inability to synthesize leucine). Imagine that the 
periment involving an Hfr strain that is susceptible F` strain is unable to synthesize threonine (thr ) but 
to streptomycin (str*) and carries the alleles thr’ and capable of leucine synthesis (Jeu*) and resistant to strep- 
leu~ (for biosynthesis of the amino acid threonine and _ tomycin (str®). The selective medium necessary to grow 
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and isolate exconjugants in this case is a minimal me- 
dium plate with added streptomycin. The streptomycin 
in the selective medium kills str> donor cells, and the 
absence of threonine prevents growth of nonrecombi- 
nant recipient cells. All growing cells on the selection 
plate are thr* leu” str®, a genotype that could occur only 
in exconjugants. 

In Figure 6.7, a segment of donor DNA containing 
thr* leu~ is shown aligning with its homologous coun- 
terpart in the recipient bacterial chromosome, contain- 
ing thr” leu". Homologous recombination can replace a 
segment of the recipient chromosome with a homolo- 
gous segment of DNA from the donor chromosome. In 
the case shown here, two crossovers transfer thr” from 
the donor DNA into the recipient chromosome, so that 
exconjugants have the genotype thr” leu* str®. This re- 
combination is produced by the activity of a group of 
recombination proteins and enzymes in bacteria that op- 
erate in the RecBCD pathway. We discuss this pathway, 
and its counterpart used during meiotic recombination 
in eukaryotes, in Sections 12.6 and 12.7. 

With or without homologous recombination to 
form an exconjugant, the ultimate fate of linear DNA 
in bacteria cells is enzymatic degradation through the 
action of nuclease enzymes. If nucleases reach the 
donated DNA before it can pair and recombine with 
the recipient chromosome, exconjugant formation is 
blocked. If recombination does take place, an excon- 
jugant chromosome forms, and the segment of the 
recipient chromosome that was spliced out during re- 
combination is degraded along with the remainder of 
the donated DNA. 

For our purposes, conjugation between an Hfr donor 
cell and an F` recipient cell has two key outcomes. First, 
the transfer of one or more donor alleles into the recipi- 
ent chromosome by homologous recombination forms 
an exconjugant chromosome. Second, the F factor is not 
transferred in full during conjugation, and therefore the 
F` recipient cell is not converted to a donor state (see 
Table 6.1). 


6.2 Interrupted Mating Analysis 
Produces Time-of-Entry Maps 


We have noted that Hfr chromosomes are too long to 
be fully transferred from a donor cell to a recipient cell. 
As a consequence, interrupted mating, the cessation 
of conjugation caused by breakage of the conjugation 
tube, takes place during naturally occurring conjuga- 
tion. Interrupted matings stop conjugation before the 
Hfr chromosome can be completely transferred from the 
donor to the recipient. Several decades ago, research- 
ers realized that if experimental conjugation was tested 
for gene transfer at timed intervals, it would be possible 


to map the order of donor genes, and to determine the 
distances between genes. This experimental strategy is 
called time-of-entry mapping. 

Each Hfr strain used in time-of-entry mapping ex- 
periments will transfer genes in a specific order that is 
a characteristic of the strain. The order of gene transfer 
and the time of the first appearance of recombinants for 
each gene are functions of the gene’s proximity to the 
origin of transfer (oriT). As a result, genes that are clos- 
est to the 5’ end of the T strand cross the conjugation 
pilus shortly after conjugation begins, while genes that 
are more distant from the 5’ end of the T strand will 
cross the conjugation pilus later in time. Genes closest to 
oriT are also more frequently transferred than are genes 
that are more distant from oriT. The result is that genes 
that are closest to oriT recombine into exconjugant 
chromosomes at earlier times and in greater numbers 
than genes that are distant from oriT. The number of 
minutes between the beginning of conjugation and the 
appearance of a particular recombinant is identified as 
the “time of entry” of the gene of interest. This measure, 
reported as minutes of conjugation, can be used to de- 
termine the order of genes on the Hfr chromosome in a 
time-of-entry map. 


Time-of-Entry Mapping Experiments 


In 1956, Ellie Wollman, Francois Jacob, and William 
Hayes used conjugation data from the F strain P678 
and the Hfr strain HfrH to demonstrate the utility of 
interrupted mating for time-of-entry mapping. In this 
experiment, P678 is str®, resistant to the antibiotic strep- 
tomycin, and HfrH is strs, streptomycin sensitive. The 
donor and recipient genotypes for six genes studied 
are given in Table 6.2. Two of these genes had known 
locations: the genes for threonine and leucine synthesis 


Table 6.2 


Genotypes of E. coli Strains F~ P678 
and HfrH 


HfrH F- P678 


thr* (prototrophic 
for threonine) 


thr” (auxotrophic 
for threonine) 


leu* (prototrophic 
for leucine) 


leu” (auxotrophic 
for leucine) 


azi? (resistant to azi (susceptible to 


sodium azide) sodium azide) 


tonA® (sensitive to 
phage T1 infection) 


tonA® (resistant to 
phage T1 infection) 


lac” (unable to utilize 
lactose) 


lac (able to 
utilize lactose) 


galB* (able to utilize 
galactose) 


galB" (unable to utilize 
galactose) 
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Figure 6.8 Time-of-entry mapping. (a) Recombinants are 
identified by screening exconjugants for donor allele acquisi- 
tion at regular intervals and plotting their time of entry into 

the exconjugant chromosome. (b) Donor alleles leu* and thr* 
appear in exconjugants within 4 minutes of conjugation 
initiation. Other donor alleles follow according to their order on 
the chromosome. (c) The Hfr chromosome time-of-entry map is 
assembled from the recombinant data. 


(thr and leu), which are closer to the origin of transfer in 
HfrH than any of the other genes tested. The goal of this 
experiment was to map the positions of azi, tonA, lac, 
and galB relative to thr and leu and to determine the dis- 
tance between genes in minutes of conjugation. 

The experiment begins by mixing of donor and re- 
cipient bacterial strains to initiate conjugation. Every few 
minutes, a small sample of the culture is removed and 
agitated to break any conjugation pili, interrupt the mat- 
ing, and stop the process of DNA transfer. The sample 
bacteria are plated on growth plates containing different 
supplemental compounds in the medium to determine 
if exconjugants have formed by recombination between 
the recipient chromosome and homologous donated 
DNA. The first recombinant alleles in exconjugants are, 
as expected, thr’ and leu*. The researchers select for 
these exconjugants by plating cells on a medium that 
lacks leucine and threonine but contains streptomycin 
and therefore will permit the growth of only leu* thr* 
str® exconjugants. The order of the other four genes is 
determined using these leu* thr* str? exconjugants. 

Samples from the conjugation mixture are taken 
every few minutes and plated on the selective medium 
that identifies those with the leu* thr* str? genotype. 
Exconjugants with this genotype are then placed on a 
second plate to determine which other donor alleles have 
undergone recombination. 

Figure 6.8a shows the results of this experiment, 
which are interpreted in Figure 6.8b: Exconjugants carry- 
ing the donor azi allele appear 8 minutes after conjuga- 
tion begins, tonA recombinants appear at 10 minutes, lac 
recombinants appear at 16 minutes, and galB recombi- 
nants are the last to appear, at 25 minutes. The order of 
these four genes and the distances in minutes between 
them are combined to produce the time-of-entry genetic 
map for HfrH (Figure 6.8c). 

Time-of-entry mapping is an effective approach for 
mapping genes near the 5’ end of the T strand. However, 
the genetic mapping information obtainable from a single 
Hfr strain is limited. First, because the conjugation pilus is 
broken and mating is interrupted, the likelihood of gene 
transfer drops off quickly with distance from oriT. Second, 
an Hfr strain can transfer genes in just one direction. 

To obtain experimental information about gene order 
and distances between genes on the bacterial chromo- 
some of a given species, multiple Hfr strains with different 
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(a) Donor allele appearance 
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sites of episome insertion and different orientations of 
the episome are examined. Each IS element on the bac- 
terial chromosome constitutes a different location of F 
factor integration, and each integration location transfers 


(a) 
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a different gene first. The donor chromosome shown in 
Figure 6.9a illustrates six genes and six IS elements. Each IS 
element is a potential site for F factor integration, and the 
first gene to transfer will be different for each integration 


Relaxosome 


Episome 
integration First gene 
atISelement to transfer 
IS6 
IS1 leu or val 
cys IS2 thr or leu 
IS3 galor thr 
Iss IS4 phe or gal 
IS5 cys or phe 
IS6 val or cys 
(b) Orientation 1 
val cys phe gal thr leu 
Last gene First gene 
5! Integrated 5! DNA i 
l F factor |-~ replication To recipient 


binding and T 
strand cleavage 


In orientation 1, oriT 
and the T strand have 
ends labeled | and ll. 


(c) Orientation 2 


Relaxase 
attached to 
5’end of 
T strand 


val cys phe gal thr leu 
| | | I | | 
First gene Last gene 
5’ Integrated To recipient 
F factor TN 
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binding and T 
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and the T strand have 
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Relaxase 
attached to 
5’end of 
T strand 


Figure 6.9 F factors integrate at IS sites in one of two orientations. (a) A model bacterial chromo- 
some with six insertion sequences (IS1 to IS6) and six nearby marker genes. (b) One F factor orientation 
into IS1 transfers the leu gene first. (c) The alternative F factor orientation at IS1 transfers the val gene 
first. Relaxase attaches to the free end of oriT at the beginning of transfer. 
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site. In addition, at each IS element, the episome can be 
oriented in either of two directions (Figure 6.9b). Thus, F 
factor orientation is a second factor determining the order 
of gene transfer for an Hfr strain. Once F factor insertion 
location and orientation occur, they are fixed characteris- 
tics of the Hfr strain that do not change. This gives each 
Hfr strain a consistent and determinable order of gene 
transfer. 

Figure 6.9b illustrates F factor integration and gene 
transfer from IS1 in orientation 1. In this orientation, 
the gene transfer order will be /eu-thr-gal-phe-cys-val. 
In the figure, the four ends of the double-stranded 
episomes are labeled I, II, III, and IV; the 5’-to-3’ polar- 
ity of strands is also indicated. Recall that relaxosome 
binding to oriT leads to cleavage of the T strand, which 
in Figure 6.9b is illustrated with ends I and II. The 5’ 
end of the T strand (with relaxase attached) moves 
across the conjugation pilus with leu as the first gene 
following the episome fragment. The T strand acts as 
a template strand for DNA replication in the recipi- 
ent cell, and the 3’ end of the T strand (highlighted in 
red) is the start point for rolling circle replication of the 
plasmid in the donor cell. Figure 6.9¢ shows the same 
simplified bacterial chromosome with insertion at IS1 
in orientation. As with orientation 1, the T strand car- 
ries oriT and has ends labeled I and II. Orientation 2 is 
the opposite of orientation 1, and it transfers genes in 
the opposite order. When the T strand is cleaved and 
its 5’ end moves across the conjugation pilus, the first 
marker gene to transfer will be val, followed by cys- 
phe-gal-thr-leu. Once again, the T strand transfers 5’ 
end first into the recipient cell and the strand is a rep- 
lication template strand. The 3’ end of the T strand in 
the donor cell (highlighted in red) is the start point for 
rolling circle replication. Genetic Analysis 6.1 guides you 
through time-of-entry mapping for an Hfr conjugation 
experiment. 


Consolidation of Hfr Maps 


In Hfr maps, an arrowhead is used to indicate the orienta- 
tion of the integrated F factor. You can think of the arrow- 
head as indicating the tip of a DNA strand that is the first 
part to enter and move across the conjugation pilus. The 
first gene to follow the arrowhead into the recipient is 
closest to oriT and crosses the conjugation pilus first and 
most frequently among all donor genes. This leads it to be 
the first gene to recombine and the gene that recombines 
in the highest frequency. 

Using this method, more than 4300 genes were 
mapped in the E. coli genome before genomic sequenc- 
ing became a reality. The time-of-entry map of the 


chromosome of the model genetic organism E. coli is 
shown with selected genes in Figure 6.10a. The chro- 
mosome is measured as 100 minutes in length, the 
approximate length of time it would take to transfer 
the entire chromosome from a donor to a recipient. 
With the advent of genomic sequencing, however, it 
became possible to identify every nucleotide base pair, 
and every gene, in a genome. The accuracy and valid- 
ity of Hfr mapping can be demonstrated by comparing 
a small segment of E. coli genomic sequence with the 
corresponding segment of the E. coli time-of-entry 
map. Figure 6.10b compares a segment of the E. coli 
time-of-entry map with the corresponding segment 
of the chromosome produced by genomic sequencing. 
The comparison spans a little less than 3 minutes of 
conjugation time, more than 2 million base pairs of 
DNA, and dozens of genes, a few of which are shown. 
It reveals exact correlation of gene placement and 
gene order. 

Let’s practice consolidating time-of-entry maps 
into a larger map of a circular chromosome using the 
following data on gene transfer from four different Hfr 
strains. For each strain, the genes are listed in order of 
transfer. The first gene transferred is at the top and the 
last gene transferred is at the bottom, and the minutes 
of conjugation are given in parentheses for each gene. 
The genes mentioned in the following discussion are 
presented in color. 


Hfr Strain 

Hfr1 Hfr2 Hfr3 Hfr4 

serR (2) nadB (8) tyrT (4) serR (4) 
leuY (10) proL (17) fumC (12) pheR (12) 
asnB (15) fumC (29) proL (24) cysE (25) 
serC (20) tyrT (37) nadB (33) leuU (37) 
tyrT (27) serC (44) leuU (46) nadB (50) 
fumC (35) asnB (49) cysE (58) proL (59) 


The data set from each Hfr strain is used to generate 
a partial map showing gene order, the distance in min- 
utes between genes, and the orientation of the integrated 
F factor. The individual Hfr maps are then consolidated 
to show each F factor integration site, its orientation, and 
the gene order and distances in minutes. We anticipate 
that the minutes of conjugation between a given pair of 
genes will be the same in each Hfr strain transferring 
the gene pair. For example, Hfr strains 1, 2, and 3 each 
transfer the gene pair tyrT-fumC, and in each strain the 
genes are 8 minutes apart, no matter the orientation of 
the episome. 
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(a) Data collected from Hfr strains for construction of 
time-of-entry map 
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(b) Comparing segments of Hfr time-of-entry maps and sequenced genome — pesk 
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Figure 6.10 Consolidated Hfr map of E. coli. (a) The 100-minute genetic map of E. coli. Genes of 
bacterial operons (see Section 14.2) are boxed. The origin of replication (oriC) is seen at 84 minutes. 

(b) Comparison of a segment of an Hfr time-of-entry map with a genomic sequence map. A 2.5-minute 
segment (minutes 42.5—45) of the E. coli time-of-entry map is shown in comparison to a segment of 
approximately 500,000 base pairs of the E. coli genome derived from E. coli genomic sequencing. 
Selected genes between 42.5 minutes and 45 minutes on the time-of-entry map (upper) are aligned 


with their positions in the genome sequence map (lower) to illustrate the compatibility of the two 
mapping approaches. 


Origin of transfer Continuation of the overlap process leads eventually 


Gene \serR leuYasnB serC tyrT fumC to closure of the circle and completion of the chromo- 

Hfr1 ; 
Minuteso° 2° 5 °S 7s some map. In the above table, for example, notice that 
fra am ole cull ial is am > Hfr1 and Hfr4 share serR as the gene nearest the site 
saa ei el a aaa of insertion. This is the connection that allows us to 

tyrT fumC L dB leuU E + : : 
Hfr3 «a I I r r_I close the circular map. To begin construction of the 
SA AE AA E circular map, we will assume that Hfr1 transfers genes 
Hfr4 proL nadB leuU cysE pheR serR 


in a clockwise direction, in other words, serR is first and 


9 TS 12 T3 9 4 : 
fumC is last. 


GENETIC ANALYSIS 


PROBLEM An interrupted mating experiment is carried out in 


; y F : x , — 1004 
E. coli to map genes for biosynthesis of the amino acids threonine $ 


(thr), leucine (leu), glutamic acid (glu), and alanine (ala). An Hfr glu; 
strain that is his* thr* leu* glu ala* str° transfers his very early and 
BREAK IT DOWN: Atime-of entry map is sensitive to the antibiotic streptomy- 50 4 ala* 


cin. It is mated to an F` strain with the 
genotype his thr leu glu ala~ str’. A 
time-of-entry profile for thr, leu, glu, and 


leu 
ala is shown at right. 0 / 
T 


T T T T T T 
a. Exconjugants that are his* and str? are initially 10 20 30 40 50 60 70 


selected for additional experimental analysis. What com- Conjugation time (min) 
pounds must be present or absent in growth plates to allow 
exconjugants containing these selected markers to grow? 


gives the order of genes on the donor chromo- 
some based on their successive appearance in 

exconjugants. The gene closest to the origin of 
transfer appears first and is followed, in order, 
by additional genes (p. 198). 


Markers among his* 
str® exconjugants (% 


BREAK IT DOWN: These initial exconjugants must be able 
to biosynthesize histidine and must be resistant to streptomycin. 


i Genotypes for the other genes are not tested in initial screening, 
b. Use the data provided to deduce the order of genes trans- but they are tested in the time-of-entry experiment (p. 198). 


ferred in this Hfr strain and to identify the distances in minutes. 
Identify the order of genes on the donor chromosome and 
indicate the approximate location of the his gene. 


Solution Strategies Solution Steps 
Evaluate 
1. Determine the topic this problem 1. The problem concerns conjugation between an Hfr donor and an F~ 
addresses and the nature of the required recipient. Answer (a) requires identification of growth medium constituents 
answer. for a his*, str? exconjugant; answer (b) requires a map of the donor genes 
based on their time of entry. 
2. Identify the critical information given in 2. Donor and recipient genotypes are given. A time-of-entry profile 
the problem. identifies the minutes of conjugation needed to transfer each donor 
gene to the recipient. 
Deduce 
3. Determine the significance 3. Very early transfer of hist indicates the gene is close to oriT and will be the 
of the very early transfer first gene to cross the conjugation tube. 


of hist in the context of œ~ 


developi nga time-of- TIP: Genes that are closer to oriT have earlier 
and more frequent opportunities to transfer 


entry map. to the recipient and to appear as recombinants 
in exconjugants than do genes that are distant 
from oriT. 
Solve Answer a 
4. Identify the compounds needed to allow 4. The growth plate used to select these markers would contain streptomycin 
growth of exconjugants with the selected and the amino acids threonine, leucine, glutamic acid, and alanine. The 
markers his* and str’, irrespective of the plate would lack histidine, thus requiring the growing strain to be his”. 
genotypes for the other genes. 
TIP: To select exconjugants that are his’ and str’, growth plates must 
provide conditions in which only the exconjugants that are resistant to 
streptomycin and able to synthesize histidine can grow. Answer b 
5. Construct a time-of-entry map based on 5. Given that his transfers first, and that gene order and distances are 
the conjugation data. identified by the time at which recombinants appear in exconjugants, 
the Hfr map for this strain is as follows: 
Origin of 
transfer 
glu thr ala leu 
Map 
Minutes |! T T T 1 
0 8 16 29 42 
L- |I 
his 


For more practice, see Problems 17, 18, and 28. Visit the Study Area to access study tools. MasteringGenetics™ 
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pheR 


asaB 


serC 


tyrT 


nadB proL 


Once completed, the consolidated Hfr map identifies 
gene order, the cumulative number of minutes, the site of 
each F factor integration, and orientation: 


90 100/0 


40 


50 


While conjugation mapping is an accurate way to 
determine gene order and to approximate the distance 
between genes, it is not precise enough to accurately map 
closely linked genes, since the differences in the time of 
entry of closely linked genes may be only a few seconds. 
Two other mechanisms of DNA transfer between bac- 
teria, transformation and transduction, were devised to 
allow more detailed determination of the order of closely 
linked genes. Section 6.4 discusses gene mapping by 
transformation, and Section 6.5 describes gene mapping 
by transduction. First, however, we describe the final 
type of donor configuration for the F factor. 
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6.3 Conjugation with F’ Strains 
Produces Partial Diploids 


Table 6.1 lists a third configuration of the F factor in 
donor bacteria, that of the so-called F’ (“F prime”) 
donor, which contains a functional but altered F factor 
derived from imperfect excision of the F factor out of 
the Hfr chromosome. The integration event that creates 
an Hfr chromosome depends on interactions between 
matching IS elements of the F factor and of the bacte- 
rial chromosome, and when this process is reversed, the 
F factor can once again become an extrachromosomal 
F* factor. Occasionally, however, the excision event is 
imprecise, and the excised F factor—in this case called 
an F’ factor—contains all of its own DNA plus a seg- 
ment of bacterial chromosomal DNA from the region 
adjacent to the integration site (Figure 6.11a). An F’ fac- 
tor can carry a variable length of bacterial DNA. Donor 
cells carrying an F’ factor are called F’ cells. 

Like the other forms of conjugation described 
above, conjugation between an F’ donor and an F` re- 
cipient follows the by-now-familiar process of relaxo- 
some complex binding to oriT, cleavage of the T strand, 
and movement of the T strand across the conjugation 
pilus with its 5’ end leading the way. Cells with small 
F’ factors are more likely to transfer the entire F’ factor 
than are cells with large bacterial chromosome inclu- 
sions. Consequently, small inclusions are usually trans- 
ferred in their entirety. 

If the entire F’ chromosome is transferred, both 
parts of oriT are transferred, allowing the F’ factor to 
circularize in the recipient cell. At the completion of 
F’ factor transfer in such cases, the recipient cell, now 
containing a complete F’ factor, is converted to an F’ 
donor (see Table 6.1). It has acquired copies of all the 
donor chromosomal genes carried on the F’ factor. 
Because the newly received chromosomal genes are ho- 
mologs of genes already present on the recipient bacte- 
rial chromosome, the resulting exconjugants are partial 
diploids. The diploid portion of the genome is limited 
to the genes present in two copies, one on the excon- 
jugant chromosome and the second on the F’ factor. 
No homologous recombination is necessary to produce 
these partially diploid genotypes, and partial diploidy is 
retained as a characteristic of these exconjugants and 
their descendants. 

Figure 6.11b illustrates the creation of a partial dip- 
loid exconjugant carrying two alleles of the lac gene. The 
lac’ allele on the F’ factor enables the cell to use lactose 
for growth, whereas the mutant Jac’ allele on the excon- 
jugant chromosome is unable to function in lactose utili- 
zation. In this partial diploid, the Jac’ allele is dominant 
over the Jac’ allele. Partial diploids of this type have been 
used in genetic studies to examine the mode of action of 
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(a) Hfr chromosome 


N oriT 


Bacterial F factor 


chromosome Jac 


Normal excision Aberrant excision 


A segment of 
the bacterial 
DNA loops out 
during excision. 


Q9) 


F’ plasmid 


TE 
chromosome 


Bacterial 
chromosome 


F* plasmid 


The F' factor contains the donor 
lac in addition to a full set of F 
factor genes. 


F cell 


Bacterial F ca | Fees Bacterial 
chromosome chromosome 
Grows on a lactose medium | Unable to grow on a lactose 
medium 
F’ cell Conjugation 


Transfer complete 
F’ cell \ F’ exconjugant 


The exconjugant is a lac*/lac” partial diploid and has acquired the ability 
to grow ona lactose medium. Because F’ plasmid transfer was complete, 
the exconjugant can act as an F’donor. 


Figure 6.11 F factor excision from Hfr integration. (a) Normal 
excision (left) restores an Hfr to an F*, whereas aberrant excision 
(right) forms an F’ plasmid in an F’ donor cell. (b) F’ x F~ conjuga- 
tion produces an exconjugant that is a partial diploid lac*/lac”. 
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genes in bacteria and to dissect the regulation of coor- 
dinated gene action in bacterial metabolism and growth 
(see Section 14.3). 

Genetic Analysis 6.2 guides you through an analysis of 
donor and recipient bacterial strains and the identifica- 
tion of donor types through the analysis of three conjuga- 
tion experiments. 


Plasmids and Conjugation in Archaea 


Research on archaea species is still in its infancy in com- 
parison to the many decades of research that exist on 
bacteria. Despite this short research history, a number of 
significant observations have been made with regard to 
archaeal plasmids and conjugation among archaeal cells. 

Like bacteria, archaea are single-celled haploid or- 
ganisms. All of the genes that are essential for the normal 
metabolic and physiologic activities of the cell are car- 
ried on the archaeal chromosome. Ongoing research on 
archaea plasmids that began in the early 1990s has iden- 
tified dozens of different plasmids among archaeal spe- 
cies. While much more study is needed, the information 
available at present indicates that most archaeal plasmids 
replicate by rolling circle replication. The data further 
identify numerous instances of plasmid-driven conjuga- 
tion between archaeal donor and recipient cells. The 
genetic composition of archaeal conjugative plasmids has 
not been well characterized, nor is there enough infor- 
mation to be able to describe the details of the archaeal 
conjugation apparatus. To date there is evidence of some 
similarities to bacterial conjugation, but there is also evi- 
dence that some aspects of archaeal conjugation may be 
substantially different from bacterial conjugation. 

In following chapters, we compare and contrast se- 
lected molecular processes and structures in archaea with 
their counterparts in bacteria and eukaryotes. Like the 
apparent circumstance with conjugation, archaea share 
some attributes with bacteria, but we will see that they 
also commonly share features with eukaryotes as well. 


6.4 Bacterial Transformation Produces 
Genetic Recombination 


Transformation occurs when a recipient cell takes up 
a fragment of donor cell DNA from the surrounding 
growth medium. The DNA fragment passes through the 
wall and membrane of the recipient cell and is incorpo- 
rated into the recipient cell chromosome by homologous 
recombination. Transformation is a naturally occurring 
mechanism that can be used to produce accurate maps 
of bacterial genes, including those that are closely linked 
and not readily mapped by conjugation experiments. The 
recipient cell taking up transforming DNA is identified 
as competent, meaning able to internalize exogenous 
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PROBLEM In £. coli, the abilities to utilize the sugar lactose, synthesize the amino acid methionine, 
and resist the antibiotic streptomycin are conferred by alleles lac’ and met* and the R plasmid carry- 
ing. Bacteria without the R plasmid are susceptible to streptomycin (str), and mutant alleles lac” and 
met” produce bacteria that are unable to grow on media containing lactose and require methionine 
supplementation for growth. E. coli strains are identified as donors or recipients in the first table, which 
also contains information on their ability to grow under various conditions. The second table contains 


growth information for the exconjugants of mat- 
ing between donor and recipient strains. In each 
table, “+” indicates growth and “—" indicates no 
growth. “Min” signifies a minimal medium, and 
supplemented minimal medium plates are indi- 
cated by, for example, “Mint+met” (minimal me- 
dium plus methionine). “Lac” indicates a plate 


containing only lactose as the sugar. 


a. Use the growth information in the first table 
to determine the genotype of each strain at 
the lac and met genes and for resistance or 
susceptibility to streptomycin loci. 

b. Use the growth information in the second 
table to determine the genotypes of 
exconjugants produced by each mating. 

c. Compare the genotypes and mating 
behavior of donors, recipient, and excon- 
jugants to determine whether each donor 
is F*, Hfr, or F’. Explain your rationale for 
each donor identification. 


Strain Type Strain Growth 
Min Lac Mint+met Min+mettstr Lactmet+str 
A Donor - = 
. B Donor = -= 
C Donor -= = 
D Recipient = = Eg + = 


BREAK IT DOWN: Anabolic and catabolic pathways and the determination of genotypes for alleles 
in these pathways are described in Research Technique 6.1, pp. 189-190. 


Mating Exconjugant Growth Are the Exconjugants 
Donors? 
Min+str Min+met+str Lac+str Lac+met+str 
AXD + + - = Yes 
BXD -— + = = Yes 
CcxD = + = + No 


and bacterial gene transfer to exconjugants by donors. 


| ee IT DOWN: Table 6.1, p. 194, summarizes the potential conversion of ) 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem 
addresses and the nature of the 
required answer. 


2. Identify the critical information given 2. 


in the problem. 


Ded uce 


3. Compare the growth characteristics 3: 


of donors and the recipient in the first 
table, and deduce which genotypes 
are likely the same. 


4. Examine the exconjugants in the sec- 4. 


ond table and determine which have 
been converted from recipients to 


donors. TIP:When an exconjugant has been 
S Ive ~) converted to a donor state, we know it ha: 
olv received a complete copy of the F factor. 


5. Determine the genotypes of the do- 
nor and recipient strains from growth 
information in the first table. 


1. 


This is a conjugation problem in which genotypes of donors and a recipient are 
determined by growth characteristics. Donor types (F*, Hfr, F’) are identified 

by growth characteristics of exconjugants. The answers require identifying 
genotypes for lac, met, and str for each donor, recipient, and exconjugant. 

The two tables identify growth characteristics. The first table contains growth 
information on three donors (A, B, and C) and a recipient (D). The second table 
contains growth information on the exconjugants of mating between each 
donor and the recipient. 


The growth characteristics of the three donor strains (A, B, and C) are identi- 
cal on each kind of medium. These three strains have the same genotype. The 
recipient, strain D, has a different set of growth characteristics and therefore a 
different genotype. 


Donor A and donor B transfer a complete F sequence to the recipient and con- 


vert the exconjugant to a donor. Donor C does not transfer the complete F se- 
quence, so the CX D exconjugant is not converted to a donor. 


st 
Answer a 
5. 


The genotype shared by donor strains A, B, and C is met* lac* str’. The minimal 
medium contains glucose. Growth of donor strains in this medium indicates 
their prototrophy for methionine (met*). Growth in the lactose—containing me- 
dium indicates they are lac’. The inability of donors to grow in media containing 
streptomycin indicates they are str°. 


The recipient genotype is met” lac” str’. It is unable to grow on the minimal 
(glucose-containing) medium, but it can grow on glucose plus methionine, indicat- 
ing it is met”. It also grows on the minimal medium plus methionine and streptomy- 
cin, indicating that it is str®. Lactose utilization is tested on the medium containing 
lactose plus methionine and streptomycin. Here it fails to grow, indicating it is lac’. 
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Answer b 


6. Determine the genotypes of exconju- 
gants from growth information in the 
second table. 


TIP: Compare the genotypes of exconjugants to the 
recipient genotype to determine if one or more donor 
alleles have been transferred during conjugation. Use 
Table 6.1 for help in categorizing each donor. 


NY Answer c 
7. Identify each donor by donor type 

and explain the rationale for each 
identification. 


CONTINUED 


6. Using analysis similar to that employed above, we conclude that the exconju- 
gant genotypes are 
AXD met* lac~ str”, conversion to donor 
BX D met” lac” str", conversion to donor 
CXD met” lac" str’, no conversion 


7. AX D exconjugants have acquired met* and have undergone conversion to a 
donor state. F’ donors can transfer an allele and convert the recipient, so we 
conclude that strain A is an F’ donor. Exconjugants of the B xX D mating retain 


the recipient genotype, but they are converted to a donor state. F* donors 
produce this result, so strain B is an F* donor. The C X D conjugation produces 
exconjugants that have acquired Jac* but have not undergone conversion. This 
is a characteristic of Hfr donors, so we conclude that strain C is Hfr. 


For more practice, see Problems 19 and 23. 


Visit the Study Area to access study tools. 
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(donor) DNA. Transformation is also used as a laboratory 
technique by molecular biologists seeking to introduce 
DNA into microbial cells, plant cells, and animal cells 
as part of the process of creating recombinant DNA or 
transgenic organisms (see Sections 16.2 and 16.4). 


Steps in Transformation 


Transformation is a four-step process, as illustrated in 
Figure 6.12. It is preceded by the lysis, or breakage, of a 
donor cell and the release of fragmented DNA from the 
donor chromosome. The transforming DNA is double 
stranded and can be taken up by a recipient bacterial cell. 

The passage of double-stranded transforming DNA 
across the recipient cell wall and cell membrane is ac- 
companied by degradation of one of the strands (step @ 
of Figure 6.12). The remaining strand of transforming 
DNA aligns with, or “invades,” a complementary region 
of the recipient chromosome @. The alignment triggers 
the action of several enzymes that excise one strand of the 
recipient chromosome and replace it with the transform- 
ing strand. This recombination event forms heteroduplex 
DNA: One strand is derived from the recipient cell, and 
the approximately complementary transforming strand is 
derived from the bacterial donor @. After the subsequent 
DNA replication and cell-division cycle @, one daughter 
cell is a transformed cell, also called the transformant. It 
contains a chromosome carrying the transforming strand 
and its newly synthesized complementary strand. The 
other daughter cell retains the recipient chromosome and 
is not genetically altered. 


Mapping by Transformation 


Transforming DNA is usually shorter than about 
100,000 bp (100 kb) in length. For a bacterial species 
like E. coli, which has a genome of 4 X 10° bp of DNA 
206 


and approximately 5000 genes, the transforming DNA 
may have 1, 2, or as many as 50 genes. Even at maximum 
lengths, transforming DNA from the donor cell repre- 
sents only 1 to 2% of the total genome of the recipient 
cell. Consequently, transformation is useful for mapping 
genes that are closely linked. To be mapped by transfor- 
mation, two or more genes must be transferred into the 
recipient on the same fragment of transforming DNA. 
Thus, genetic analysis focuses on cotransformation, the 
simultaneous transformation of two or more genes. For 
cotransformation to occur, the crossover events must 
incorporate closely linked genes on a single fragment of 
transforming DNA. 


6.5 Bacterial Transduction Is Mediated 
by Bacteriophages 


Transduction is the transfer of genetic material from a do- 
nor bacterial cell and the integration of that material into a 
recipient bacterial cell by way of a bacteriophage acting as 
a vector. To accomplish this transfer, a bacteriophage must 
infect the donor cell, and a few of the progeny phages must 
errantly package a fragment of the donor bacterial chromo- 
some rather than a complete copy of the phage chromosome. 
Following lysis of the original bacterial host cell, phages 
carrying the mispackaged bacterial DNA attach to a new 
host cell (the recipient cell) and inject the donor chromo- 
some fragment. Inside the recipient, homologous recom- 
bination can take place between the donated fragment and 
the recipient chromosome. In this section, we review the 
life cycles of bacteriophages (phages, for short) that infect 
E. coli. We then consider cotransduction mapping, a power- 
ful technique for mapping bacterial genomes and the role of 
generalized transduction in this process. We conclude the 
section with a discussion of specialized transduction. 


Double-stranded Recipient 
donor DNA chromosome 
as 
Receptor 
site 


@ Donor DNA binds at the 


receptor site. One strand a’ 
is degraded as it enters 
the recipient cell. Pid \ 
a a \ 
7 A 


Degraded 
DNA-binding y nucleotides 
complexat  |/' 
receptor 


Recipient Cytoplasmic 
cell wall membrane 


© The transforming strand pairs 
with the homologous region 
of the recipient chromosome. 


a‘. 
Transforming » 


strand 


Heteroduplex l 
DNA 


(3) The transforming strand 
displaces a recipient strand, 
forming complementary 
heteroduplex DNA (a~7/a*). The 
excess strand degrades. 


DNA replication 
and cell division 


Transformant 


Nontransformant 


© DNA replication and cell division produce one 
transformant and one nontransformant. 


Figure 6.12 Transformation of a competent bacterium (a~) 
by donor DNA (a*). 
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Bacteriophage Life Cycles 


Bacteriophage particles are generally less than 1% the size of 
the bacterial cells they attack. Their outer structure is a pro- 
tein coat composed of an icosahedral head, a hollow protein 
sheath, and in some phages, a set of appendages called tail 
fibers (Figure 6.13). The phage’s head houses its rudimen- 
tary genome, composed of a single chromosome ranging in 
size from about 5000 to 100,000 base pairs. The replication 
of phage DNA, the transcription of phage genes, and the 
translation that produces phage proteins are dependent on 
numerous proteins and enzymes found in the host bacterial 
cells, which the phages must infect in order to reproduce. 

Bacteriophages employ a variety of mechanisms to at- 
tack bacteria. All of the mechanisms make use of bacterial 
proteins that evolved in the bacteria for other purposes than 
as a means of phage entry. For example, A phage uses the 
maltose-binding protein of E. coli as a site of attachment. 
Maltose-binding protein studs the surface of E. coli cells, 
which use it to sense the presence of the sugar maltose in 
the growth medium. Thus, when studying the infection of 
E. coli by X phage, microbiologists add maltose to the growth 
medium as a means of enhancing the phage infection rate. 

Bacteriophages actively seek out and attach to host 
cells, commencing a six-step process called the lytic cycle, 
that leads to the lysis of the host cell. Lysis releases up to 
200 progeny phage particles. The steps composing the 
lytic cycle are depicted in Figure 6.14. 


Base plate fibers 


Aphage Head 


DNA 


Sheath 


ANN 
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Figure 6.13 T4 bacteriophage and A phage structures. 
Bacteriophages consist of a proteinaceous head filled with DNA, 
a sheath, and, in some phages, tail fibers. 
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Lytic cycle <<—— Infection —>_ L sonic cycle 


© à phage injects DNA @ 
through hollow tail. 


Lysogenic cycle 


@ Progeny A phages 
are released by lysis 
from host bacteria. 


Phage chromosome 
circularizes to protect 
it from degradation. (3) Integration of 

À prophage DNA 
into the host 
chromosome. 


© DNA and proteins are 
assembled into progeny À Multiple divisions and 
phages. | Lytic cycle many generations may 
occur in this state; 
prophage DNA is copied 
when cell divides. 


Í Prophage~/” 


(3) Replication of phage 
chromosome occurs; host 
DNA breaks down. 


(4) Under the direction of phage 
genes, transcription and 
translation produce new —— | 
phage particles. 


@ The lytic cycle 
resumes. 


@ Excision of A 
| }<— prophage from the 
host chromosome. 


Figure 6.14 The lytic and lysogenic life cycles of a temperate bacteriophage. The lytic cycle progresses 
directly from infection through phage reproduction to lysis. The lysogenic cycle features the integration of 
the phage into the host chromosome where it resides until excision and resumption of the lytic cycle. 


@ Attachment of the phage particle to the host @ Transcription and translation of phage genes, using 
cell. numerous host proteins, enzymes, and ribosomes. 

© Injection of the phage chromosome into the host Heads, sheaths, and tail fibers for all progeny particles 
cell. Injection is quickly followed by circularization of must be synthesized and assembled. 
the phage chromosome, to protect it from enzymatic @ Packaging of phage chromosomes into phage heads. 
degradation. This step is commonly accompanied by fragmentation 

© Replication of phage DNA, using numerous of the host chromosome. Occasional mispackaging of 
host proteins and enzymes. A copy of the phage chro- a fragment of the host chromosome into a phage head 
mosome is required for each of the eventual progeny can follow chromosome fragmentation. 
phage particles, which generally number between 50 © Lysis of the host cell, resulting in the death of the 


and 200. host and the release of progeny phage particles. 


Certain bacteriophages—classified as temperate 
phages, of which A phage is the best-known example— 
are capable of a temporary, alternative life cycle that leads 
to the integration of the phage chromosome into the 
bacterial host chromosome. The integration process is 
termed lysogeny. Environmental and growth conditions 
are largely what initiate a lysogenic cycle. Lysogeny can 
persist for many bacterial replication and division cycles, 
but it eventually comes to an end, and the lytic cycle 
resumes. (We discuss the details and genetic regulation 
of this alternation between life cycles in Section 14.6.) 
Five steps characterizing the lysogenic cycle are shown in 
Figure 6.14. 


@ Attachment of the phage particle to the host cell. 


© Injection of the phage chromosome into the 
host cell, followed by phage-chromosome 
circularization. 


© Integration of the phage chromosome into the 
host chromosome. This process is site specific, 
meaning that it occurs at a specific DNA sequence 
found in both the phage and bacterial chromosomes. 
Once integrated into the host chromosome, the phage 
DNA is termed the prophage. The prophage remains 
stably integrated at the same location for multiple 
cycles of bacterial chromosome replication and cell 
division. 

© Excision of the prophage. In response to an envi- 
ronmental signal, such as a high dose of ultraviolet 
irradiation, the prophage reverses its integration and 
is excised intact. This event is usually an exact rever- 
sal of the site-specific integration, but rare mistakes in 
prophage excision lead to a specific kind of abnormal 
phage that may contain host genetic material. 

© Resumption of the lytic cycle, beginning with 
phage-chromosome replication. 


Generalized Transduction 


In the decades since the 1952 discovery and description of 
discovered generalized transduction by Norman Zinder 
and Joshua Lederberg, numerous kinds of generalized 
transducing phages have been identified. Generalized 
transducing phages are formed when a random piece 
of donor bacterial DNA of the appropriate length is mis- 
takenly packed into the phage head instead of a similarly 
sized length of phage DNA. This occasional error in DNA 
packaging occurs because the packing mechanism that 
inserts DNA into the phage head discriminates DNA 
by its length (in base pairs) rather than by sequence. 
Generalized transducing phages can carry any segment 
of donor DNA, since the process of mistaken packaging 
is random. 

The phage P1 is a well-studied bacteriophage that 
infects E. coli and is a prolific producer of generalized 
transducing phages. This phage was initially chosen for 
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intensive study of its transduction ability because it has a 
large genome of nearly 100,000 bp (100 kb). To produce 
progeny generalized transducing phages, P1 must capture 
segments of donor bacterial DNA that are almost exactly 
100 kb, a length that is nearly 2% of the E. coli chromo- 
some. Analysis of P1 infections tells us that about 1 in 
50 progeny of a P1 infection are generalized transducing 
phages. 

Figure 6.15 illustrates generalized transduction in 
seven steps (combining attachment and injection into a 
single first step). The outcome of transduction is the pro- 
duction of a transductant, a bacterium that has acquired 
one or more donor genes through transduction: 


@ A normal P1 phage attaches to a donor bacterial cell 
and injects its chromosome into the cell. 


@ Replication of the phage chromosome is followed 
by transcription and translation to produce phage 
proteins. Fragmentation of the bacterial chromosome 
precedes the packaging of phage chromosomes into 
phage heads. 


© Assembly of progeny phage, including packing of 
phage heads, is largely normal, but a few progeny 
phages receive a random fragment of the donor 
bacterial chromosome that is approximately the same 
length as the phage chromosome. These abnormal 
progeny phages are generalized transducing phages. 


Host-cell lysis releases normal and generalized 
transducing phages. 


Generalized transducing phages attach to new recipi- 
ent cells and inject the fragment of donor DNA. 


@ In each recipient cell, homologous recombination 
occurs between the fragment of donor DNA and the 
recipient chromosome. Pairs of crossover events are 
required to splice the donor fragment into the recipi- 
ent chromosome and excise a homologous segment of 
the chromosome. The excised chromosome fragment 
is degraded by enzymes. 


@ A stable transductant strain results. 


Cotransduction 


The donor cell in the transduction experiment shown in 
Figure 6.16 has the genotype met” his", and the recipient 
is met his . The bacterial culture in which this experi- 
ment takes place will contain millions of bacteria, most of 
which are not transduced. In addition, many cells may be 
transduced with donor alleles that are not tested for in the 
experiment. The transductants detected in this particular 
experiment are those in which either one or both of the 
met” or his* alleles are transduced. 

Transductants having either the genotype met* 
his” or the genotype met” his* offer evidence that each 
allele can be individually transduced. In addition, a cer- 
tain number of transductants will undergo simultaneous 


PhageP1 @ P1 phage infects a met*, his* 
donor cell. 
P1 | _— Donor bacterium 
DNA his met* (met*, his*) 
™ Bacterial 
= a chromosome 
Fragments 
of bacterial | 
chromosome 


© Phage chromosome is 
replicated, and phage 
proteins are expressed. The 
donor chromosome 
fragments. 


Transducing 


P1 phage 


P1 phage 


© Progeny phage assembly 
yields normal phage 
carrying the phage 
chromosome and 
transducing phages carrying 
a fragment of the donor 
chromosome. 


© Lysis releases normal and 
transducing progeny phages. 


© A met* transducing phage infects a met- 
recipient cell and injects the donor DNA 
fragment. 


pe x Recipient bacterium 
met e mer, his’) 


~ 


Bacterial 
chromosome 


© Homologous recombination 
at two crossover points 
exchanges segments 
between the donor 
fragment and the recipient 
chromosome. 


@ The transductant is met*, 
his~. Excised DNA containing 
met- is degraded. 


Transductant bacterium (met*, his-) 


Figure 6.15 Transduction by P1 phage. Transducing phages 
are generated by the mistaken packaging of a fragment of the 
donor bacterium’s DNA into a phage head (@). Transductant 
bacteria are produced by homologous recombination between 
the introduced fragment of donor DNA and the recipient 
bacterial chromosome (Q) and (@). 
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transduction of both genes to produce met* his* trans- 
ductants. These cells have undergone cotransduction 
of both donor alleles. The frequency of cotransduction, 
called cotransduction frequency, depends on how close 
the two genes are to one another on the donor chromo- 
some. The closer the genes are, the higher the probability 
of cotransduction (thus, the higher the cotransduction 
frequency), and the farther apart the genes are, the lower 
the cotransduction probability. If, for example, an experi- 
menter carried out the transduction cross in Figure 6.16 
and identified 200 transductants for met’, the experi- 
menter could determine the frequency of cotransduction 
by then identifying how many of those met” transductants 
were also transduced (i.e., were cotransduced) for his". If 
the analysis determined that 28 of the 200 met” transduc- 
tants were also transduced for his‘, the cotransduction 
frequency for those genes is 14% (55). 

To succeed in finding cotransductants in an experi- 
ment, researchers may have to genotype large numbers 
of colonies. To reduce the number of colonies that must 
be genotyped in such experiments, a two-step strategy 
is used that first identifies cells transduced with one do- 
nor allele and then screens those transductants for the 
acquisition of additional donor alleles. The first step em- 
ploys a selected marker screen to identify transductants 
for one of the donor alleles of interest. Transductants 
for the selected marker are then screened a second time, 
for a second donor allele, in an unselected marker 
screen. The goal is to determine the percentage of trans- 
ductants for the selected marker that are also transduced 
for the unselected marker, while reducing unnecessary 
colony genotyping. 


Cotransduction Mapping 


Genetic map construction in bacteria uses cotrans- 
duction frequencies to determine the relative order 
of three or more genes. In cotransduction mapping, 
the frequency of cotransduction is greater for genes 
that are close together and is lower for genes that are 
farther apart. The reason is that any two genes on the 
donor chromosome have two chances to be separated 
by a chromosomal event. The first separation chance 
comes when the donor chromosome fragments. Genes 
that are close together are more likely to be on the same 
donor chromosome fragment than genes that are far 
apart. The second chance for separation comes during 
homologous recombination. Once again, genes that are 
close together on the donor fragment are less likely to 
be separated by a crossover event than genes that are far 
apart on the fragment. 

Let’s look at two studies that test the order of the 
same four genes in E. coli. Figure 6.16 provides cotrans- 
duction data for experiments performed in 1959 by 
Charles Yanofsky on genes that are part of the tryptophan 
operon, a cluster of genes involved in the synthesis of 
the amino acid tryptophan that share a single promoter. 


(a) Cotransduction frequencies 
Percent 


cotransduction of 


Donor Recipient Selected Unselected unselected marker 
genotype genotype marker marker with cys* 

cys*trpE* cys trpE  cys* trpE* 63 

cys*trpC* cystrpCcys* trpC* 53 

cys’ trpB* cys trpB© cys* trpB* 47 

cys*trpA* cys trpA cys* trpA* 46 


(b) trp operon map 


| E E Em 
cys trpE trpC trpB trpA 


Figure 6.16 Yanofsky’s cotransduction frequency analysis 
and mapping of trp operon genes in E. coli. 

(a) Cotransduction frequencies of cyst and a gene of the trp 
operon are determined in separate selected marker-unselected 
marker experiments. (b) Yanofsky’s proposed map of the trp 
operon. 


(We discuss this operon in detail in Section 14.4). For the 
current discussion, you only need to know that genes in 
an operon are transcribed under the control of a single 
promoter and are much closer to one another than genes 
that have their own promoters. 

Yanofsky used the selected—unselected marker ap- 
proach to determine cotransduction frequencies for 
each of four genes in the tryptophan operon (trpA, 
trpB, trpC, and trpE) and a gene outside the operon, 
cys. Yanofsky performed four crosses, each with a donor 
strain that was cys* and prototrophic for one trp gene. 
His recipient strains were each cys and auxotrophic 
for the trp gene being tested. At the time he began his 
experiments, Yanofsky knew that cys lies outside the 
tryptophan operon, and he constructed his experiments 
to measure the cotransduction frequency between cys 
and the trp gene of interest. In each experiment, cys* 
was the selected marker used to identify informative 
transductants. The unselected marker was the trp allele 
from the donor. Yanofsky acquired data to determine 
the cotransduction frequency of cys* and the unselected 
trp marker. 

In his first experiment, he determined that among 
cys* transductants, 63% are cotransduced for trpE*. In 
his second experiment, he found 53% cotransduction 
between cys‘ and trpC’. Yanofsky concluded that trpE is 
closer to cys than is trpC based on the higher cotransduc- 
tion frequencies for cys and trpE than for cys and trpC. 
Cotransduction frequencies for cys and trpB and for cys 
and trpA are not sufficiently different to determine gene 
order, but based on cotransduction frequencies, trpA 
and trpB are each more distant from cys than are trpE 
and trpC. Yanofsky proposed a genetic map of the tryp- 
tophan operon with the order cys-trpE-trpC-trpB-trpA. 

The second study was conducted to test the order of 
these genes and either corroborate or refute Yanofsky’s 
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Table 6.3 Test of Yanofsky’s Proposed trp Operon 
Gene Order 

Transductant Transductant 

Class Genotype Number 

1 cys* trpC” trpB™ 139 

2 cys* trpC- trpB* 18 

3 cys* trpC* trpB* 141 

4 cys* trpC a trpB- 4 

TOTAL 302 


proposed gene map. In this study the donor bacterial 
genotype is cys* trpC™ trpB™ and the recipient gen- 
otype is cys. trpC* trpB*. Transductants are selected 
for cys” transduction, and the transductants are then 
screened to determine their genotypes for trpC and trpB. 
The genotypes of 302 cys* transductants are shown in 
Table 6.3. Cotransductants for the donor cys and trpC 
alleles havethe genotype cys* trpC” and are found in 
Class 1, which has 139 cotransductants, and Class 2, 
which has 18. The cys—trpC cotransduction frequency is 
therefore 3¢5 + 34) = 0.52, or 52%. Similarly, cotrans- 
duction of cys and trpB is identified by the genotype cys* 
trpB . Transductant Classes 1 and 4 have this cotrans 
ductant genotype, and the cotransduction frequency is 
bax + 397 = 0.47, or 47%. 

To test Yanofsky’s proposed trp operon map, the 
crossover events required to produce each cotransduc- 
tant class are identified. Figure 6.17 illustrates the locations 
of four crossover points used in different combinations 
for each cotransductant class. Transductants acquiring 
cys* must undergo crossover at point 1 plus atleast one 
additional point. The precise location of crossover point 
1 can vary over a large expanse of the chromosome to 
the left of cys. The second crossover point must occur 
to the right of cys in any of three locations: at location 
2, within a relatively large distance between cys, which 
is outside the operon, and trpC within the operon; 
at point 3, a very small space in the operon between 
trpC and trpB; or at point 4, a large region to the right 
of trpB. Three different double-crossover combinations 
generate transductant Classes 1, 2, and 3, and transduc- 
tant Class 4 is produced by a quadruple recombination 
requiring crossover at all four points. The quadruple 
crossover is expected to be the least frequent of the com- 
binations producing cotransductants. This study verifies 
Yanofsky’s proposed trp operon map for two reasons. 
First, cotransduction frequencies for cys—trpC and for 
cys—trpB are almost identical in the two studies (53% 
versus 52% for cys—trpC, and 46 versus 47% for cys—trpB), 
placing trpC closest to cys in both. Second, the quadruple 
recombination event is expected to occur less frequently 
than any of the double crossover events. 
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Crossover analysis of cotransduction data 
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Crossover events 
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Figure 6.17 A test of Yanofsky’s proposed trp operon gene 
map. The approximate locations of possible crossovers are 
numbered 1 through 4. For each cotransductant genotype, the 
required crossover sites are identified. 


Specialized Transduction 


As described above, temperate bacteriophages, such as 
lambda (A) phage, have the ability to lysogenize their 
host by integrating into the host chromosome to create 
a prophage. The site of integration is a DNA sequence 
called the att site (for “attachment”) that is identical in 
the bacterial chromosome and the phage chromosome. 
The 15-bp sequence is called attP in lambda phage 
(the P stands for phage) and attB (B for bacteria) in its 
host E. coli bacterium (Figure 6.18). A specialized phage 
enzyme recognizes the att sites and makes a staggered 
cut there. The complementary single-stranded ends of 
cleaved att DNA reanneal as the prophage integrates, 
to create an att sequence at each end of the integrated 
prophage. Sequences P and P’ flanking attP and B 
and B’ flanking attB are added to allow you to more 


A phage insertion 


à phage DNA (circular form) 


E. coli DNA 


B attB B’ 


À prophage excision 
Integrated prophage 


Excised À phage 


attP 


E. coli DNA 
eN CTTGA 
GGA GAACT 
B attB B’ 


Figure 6.18 Bacteriophage A site-specific integration 

and excision. Integration occurs at identical attB and attP 
DNA sequences on the bacterial chromosome and in phage 
DNA, respectively. Excision of the prophage exactly reverses 
integration. 


easily follow the integration and excision processes of 
the prophage. 

Because the attB and attP sequences are identical, the 
excision of a prophage is almost always the exact reversal 
of prophage integration. Occasionally, however, excision 
is inaccurate: It removes only a portion of the prophage 
and, along with it, a portion of the adjacent bacterial 
chromosome. Aberrant excision of a prophage forms a 
specialized transducing phage (Figure 6.19). In E. coli, 
attB is located between the genes galK and bioA; thus, 
aberrant prophage excision occurring in one direction 
will capture the bacterial gal* gene to form the Adgal* 
specialized transducing phage (d is for defective), and in 
the other direction will capture the bacterial bioA gene, to 
form the \dbio* specialized transducing phage. 
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Figure 6.19 Patterns of A prophage induction. The à phage 
integrates into the host bacterial chromosome to form the 
prophage by site-specific recombination between the attP and 
attB sites (upper). Normal prophage induction precisely reverses 
integration and restores attB and attP sequences (middle). 
Aberrant induction (lower) produces specialized transducing 
phage Adbio* or Adgal*, depending on the direction of aberrant 
induction. 


Both kinds of specialized transducing phages are 
defective for certain attributes of phage growth and 
behavior. The Adgal* phage is missing several essential 
genes, so while it can infect host cells, it cannot complete 
either the lytic or lysogenic cycle. In contrast, \dbio* 
phages are not missing any essential genes, but they lack 
genes necessary for lysogeny. Thus, \dbio* phages are 
exclusively lytic. 

Genetic Analysis 6.3 guides you through an analysis of 
a transduction to determine gene order in a donor strain. 


6.6 Bacteriophage Chromosomes Are 
Mapped by Fine-Structure Analysis 


Before DNA was identified as the hereditary material, 
many biologists regarded genes as indivisible units of he- 
redity that could not be subdivided by recombination. This 
idea derives from Mendel’s original description of “par- 
ticulate inheritance” of traits. Before knowing the molecu- 
lar structure of DNA, biologists had difficulty describing 
how recombination within a gene could occur. Geneticists 
knew that different mutations could affect a single gene, 
and had data from the 1949 study of intragenic recombina- 
tion of the Drosophila lozenge eye mutation by Melvin and 
Kathleen Green showing that different mutations can oc- 
cupy unique locations within a gene (see Figure 5.12). But 
what remained lacking was a refined understanding of the 
internal structure, or fine structure, of genes. 

Beginning in the early 1950s, Seymour Benzer helped 
define how biologists view the structure of genes with 
a series of experiments that revealed the existence of a 
genetic fine structure, a phrase referring to the com- 
position of genes at the level of their molecular building 
blocks. Benzer demonstrated that the building blocks of 
genes were responsible for both mutation and recom- 
bination. The publication of his principal conclusions 
coincided with the identification of the molecular struc- 
ture of DNA. When the functional subunits of DNA were 
revealed to be nucleotides, it was impossible to miss the 
connection between them and Benzer’s fine structure. 

Benzer focused on two questions. First, was the gene 
the fundamental unit of mutation, or could components 
of genes be mutated? Second, was recombination a pro- 
cess occurring only between genes, or did recombination 
also occur between the components of genes? Benzer 
studied these questions using the rI region of the T4 
bacteriophage. Genes in the r// region determine whether 
and how the phage will lyse its E. coli host. Lysis is exam- 
ined using a bacterial lawn, a solid coating of bacteria on 
the surface of a growth medium. If the growing bacteria 
are exposed to a bacteriophage, infected cells lyse and 


GENETIC ANALYSIS 


PROBLEM In £. coli, thr’ and leu* are prototrophic alleles that control synthesis of the amino acids 
threonine and leucine. The auxotrophic alleles are defective in their ability to synthesize these amino 
acids. Bacteria carrying the azi? allele are resistant to the effects of the compound azide that inhibits 
protein transport, and those carrying azi’ are susceptible to the inhibitory effects 


of azide. E. coli with the genotype thr* leu* azi are infected with the P1 phage. 


f 
| 


Progeny phages are collected and used to infect bac- 
BREAK IT DOWN: Carefully note the | 
genotypes of the donor and recipient strains 
and remember that transductant genotypes 
are the former recipient genotypes that have 
acquired one or more donor genes (p. 211). 


| teria with the genotype thr” leu” azi 
“are then placed on media selective for one or two of > thr" azi? = 0%, leu* = 4% 

the donor markers in a transduction experiment. — 7 = — 
The table at right identifies the selected markers and 


Selected Unselected 
Experiment Marker(s) Marker(s) 
>, and the cells 1 leu™ azi? = 50%, thr’ = 4% 


3 leutandthrt azi? = 2% 


gives the frequency of cotransduction of unselected 
markers for each experiment. From the information provided, determine the 
order of the three genes on the donor chromosome. 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem 
addresses and the nature of the 
required answer. 


2. Identify the critical information given 


in the problem. 


1. This is a cotransduction problem in which cotransduction frequencies are to be 
used to determine the order of three genes in the donor. 


2. The results of three transduction experiments are given. Each experiment has a 
different gene as the selected marker. 


Deduce 

3. Be aware of the advantage of using 
the selected—unselected marker ex- 
perimental approach. 

4. Interpret the results of each 
experiment. 


IP: Cotransduction frequencies are highest for 


T 
genes that are closest together on the bacterial 
chromosome. 


3. Selecting for transduction of one of the genes of interest and then evaluating 
transductants for the other gene(s) reduces the number of plates that must be 
evaluated and simplifies the experimental analysis. 

4. Experiment 1 indicates close proximity of leu and azi, and a greater distance 
between /eu and thr. Experiment 2 suggests the same more distant relationship 
between thr and leu, but also shows no cotransduction between thr and azi. 
Experiment 3 informs us that cotransduction of all three donor alleles occurs, 
though at a low frequency. We can interpret this to mean that the segment of 
chromosome containing these genes is small enough to form a single fragment 
for transduction. 


Solve 


5. Combine your observations to iden- 
tify the order of these three genes. 


PA 


TIP: Crossovers occur in pairs during the homolo- 
gous recombination that accompanies transduction. 
When three genes are involved, a quadruple crossover 
is less frequent than any of the double crossovers. 


For more practice, see Problems 9, 20, and 24. 


5. Putting the results of these experiments together, we can identify cotransduc- 
tion of thr and azi (shown at 0% in experiment 2) as the quadruple-crossover 
cotransductant. All other events are a result of double crossover. The quadruple 
crossover event is expected to be least frequent among the cotransductants. On 
this basis, leu can be identified as the middle gene of the three tested. The gene 
map is shown below, and the four crossover intervals are identified. 


Donor 


Recipient 


The crossover events accounting for each cotransduction detected in the experi- 
ments are shown below. 


Cotransduction Crossovers 

azi? and leu* 1and 3 
“leu* and thr* 2and4 
. az, leu*, and thr* 1and4 
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Figure 6.20 Plaque formation by rll wild types and 
mutants. Ona bacterial lawn of E. coli B strain, small, circular 
wild-type plaques are formed by T4 phages with a wild-type 
rll region. Large, irregular mutant plaques are formed by T4 
phages with rll mutations. 


progeny phages are released. Progeny phages infect new 
host cells, and as the infection—lysis—infection cycle con- 
tinues, a bacteria-free spot called a plaque—a hole in the 
bacterial lawn—appears on the growth medium. 

Benzer showed that two genes, r/JA and rIIB, control 
the ability of T4 phages to lyse E. coli host cells. Those 
T4 phages carrying wild-type copies of rIIA and rIIB lyse 
multiple strains of E. coli, leading to the production of 
small plaques (Figure 6.20). On the other hand, phages 
with mutation of either rA or rIIB form large, irregularly 
shaped plaques on E. coli strain B, but they are unable to 
form any plaques on E. coli K12 (A). 

Benzer used several different mutagens to produce 
almost 20,000 rI mutants that he studied in three ways. 
First, he used genetic complementation analysis, which 
showed that there are two genes in the r// region. Second, 
he mapped different mutations of rIIA and different 
mutations of rIIB, thus showing that intragenic recom- 
bination was possible and could be used to establish the 
locations of different mutations in each gene. Finally, 
Benzer developed deletion mapping to refine the genetic 
map. The following discussions examine each of these 
achievements individually. 


Genetic Complementation Analysis 


To identify the number of genes in the rI region, Benzer 
performed genetic complementation analysis, coinfecting 
K12 (A) bacteria with different pairs of r/J mutants. When 
two rI mutants exhibiting genetic complementation coin- 
fect K12 (A) bacteria, plaques form on the bacterial lawn, 
indicating that wild-type lysis has been restored. This re- 
sult identifies the mutants as mutations of different genes. 
Coinfections by rI mutants that did not lead to plaque 
formation on K12 (A) represented a failure to complement, 
and these pairs were identified as mutations of the same 
gene. These mutants of a single gene are alleles of one 
another. Benzer identified two genetic complementation 


groups, which he designated A and B, and these led him to 
identify two genes in the r/J region: rIIA and rIIB. 
Subsequent analysis revealed that each gene pro- 
duces a protein and that both proteins are required for 
lysis. Figure 6.21a illustrates genetic complementation for 


(a) Complementation of mutations in different genes 


Mutation Mutation 


Mocus aT x 
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Viral A B A B 
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(KS 
E. coli K12 (A) lawn ———- 
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During simultaneous infection, complementation occurs 
because functional forms of both A and B proteins are present. 


(b) No complementation of mutations in the same genes 
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E. coli K12 (A) lawn 


During simultaneous infection, no complementation 
occurs because no functional A proteins are present. 


Mutation Mutation 
rll locus x 
Viral A B A B 
products: functional defective functional defective 
E. coli K12 (A) lawn + A No plaques 


During simultaneous infection, no complementation 
occurs because no functional B proteins are present. 


Figure 6.21 Genetic complementation analysis for rll 

lysis. (a) Genetic complementation of two lysis-defective 
phage mutants occurs when the mutants carry mutations of 
different genes. Genetic complementation is revealed by the 
formation of many wild-type plaques on K12 (A) bacteria. (b) No 
complementation occurs in lysis-defective mutants that carry 
mutations of the same gene. 
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one pair of r/J mutants. One mutant produces functional 
A protein and the other produces functional B protein, 
thus providing all the protein components necessary to 
carry out lysis. Genetic complementation produces a 
large number of plaques in infected bacterial lawns, but 
the individual progeny phages released following lysis re- 
main mutant. Figure 6.21b illustrates a failure of mutants 
to complement. In this example, both mutants carry a 
mutation of rIIB. 


Intragenic Recombination Analysis 


On rare occasions, Benzer observed that two lysis mu- 
tants that fail to complement (i.e., mutants of the same 
gene) nonetheless produce a few plaques of K12 (A). He 
proposed that these plaques were produced by wild-type 
phage that resulted from rare intragenic recombination 
between two mutants whose chromosomes carry muta- 
tions in different locations in a single gene (Figure 6.22). 
One of the resulting recombinant chromosomes carries 
a double mutation, and the other is wild type. Wild-type 
chromosomes are found in progeny phages, that carry out 
wild-type lysis. 

Based on a determination of the number of cells in an 
experimental flask and counting the number of K12 (A) 
plaques subsequently produced, Benzer was able to calcu- 
late the intragenic recombination frequency within the rH 
gene for a given pair of mutations. Reasoning that recipro- 
cal recombination was more likely to occur between two 
mutations that are distant within a gene, and less likely 
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Figure 6.22 Simultaneous coinfection of a host cell by two 
noncomplementing r//A mutants. No complementation (left) 
is the common and expected outcome. Rarely, however, intra- 
genic recombination (right) produces wild-type and double- 
mutant progeny phage. 


between mutations that are closer within a gene, Benzer 
was able to convert the observed number of plaques into 
a frequency of recombination with which he mapped rI 
mutations. The detected recombination frequencies were 
very small, but because of the large number of observa- 
tions he made, Benzer was able to conclude that if no wild- 
type recombinants were obtained, the mutations occurred 
in the same nucleotide. 


Deletion-Mapping Analysis 


Benzer’s mutagenesis of r/J generated two types of mu- 
tants: revertible mutants, which could undergo spon- 
taneous reversion back to wild type, and nonrevertible 
mutants, which never reverted. Revertible mutations are 
caused by DNA base-sequence substitutions (point muta- 
tions), which can be changed back to wild-type sequence 
by reversion. On the other hand, nonrevertible mutations 
are partial deletion mutations, in which part of the gene 
sequence is lost. A deleted DNA sequence cannot be re- 
stored by reversion. 

Using a technique called deletion mapping, Benzer 
took advantage of this difference between revertible and 
nonrevertible mutants to map the position of individual 
rlI mutations. Deletion mapping relies on the production 
of wild-type phage by intragenic recombination between 
a revertible mutant and nonrevertible mutant. When one 
mutant is revertible and the other is nonrevertible, the 
ability to form wild-type intragenic recombinants depends 
on the locations of the mutations. Figure 6.23a illustrates 
reversion to wild type through intragenic recombination 
between a point mutation and a deletion mutation whose 
locations do not overlap. In contrast, Figure 6.23b shows 
that if the locations of the point mutation and the deletion 
mutation overlap one another, the production of wild- 
type intragenic recombinants is impossible. Wild-type 
recombinants are not formed in this case, because the 
deletion mutant cannot provide the wild-type sequence to 
replace the mutated sequence in the point mutant. 

In research published between 1955 and 1962, Benzer 
conducted deletion mapping of almost 20,000 rII mu- 
tants. He infected bacteria with phage carrying individual 
revertible mutations (point mutations), paired one at a 
time with phage carrying different nonrevertible muta- 
tions (deletion mutations). 

In 1961, Benzer published a fine-structure map 
containing 1612 point mutations of rIIA and rIIB 
(Figure 6.24). Two features of this map are of interest. 
First, the mutations are scattered throughout rIIA and 
rlIB, suggesting the genes are composed of subunits 
that are individually mutable. Second, the distribution 
of the mutations is nonrandom. More than 100 point 
mutations aggregate in region A6c, and region B4 is 
the site of more than 500 independent point mutations. 
These sites are mutational hotspots that can be brought 
about by several circumstances (see Section 12.1). 


(a) Nonoverlapping mutations, 
wild-type recombination 
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Figure 6.23 Deletion mapping 
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Figure 6.24 A genetic 


map showing the location of 
revertible (point) mutants of 
the rll region. This mutational 


map assembled by Benzer 
places more than 1600 mutants 
in the rll region and identifies 
hotspots where mutations are 
particularly common. 
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Several of Benzer’s deletions are shown, and his map- 
ping strategy is outlined, in Figure 6.25. Thirty-two dele- 
tion mutants in two groups called Series I and Series II 
are shown in Figure 6.25a. In Figure 6.25b, an r/JA point 
mutant is tested for its ability to form wild-type recombi- 
nants with the seven Series I deletion mutants and a subset 
of three Series II deletion mutants. Series I mutants are 
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Mutational hotspot 


used first, to determine which of the six segments of rIIA 
(A1 to A6) contains the point mutant. The point mutant 
in this example forms wild-type recombinants with dele- 
tion mutant 638 but not with any of the six other mutants 
tested. The only r/JA region present in 638 that is absent 
in the other mutants is segment A6, leading to the conclu- 
sion that the point mutation occurs in the A6 segment of 
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Arevertible point mutation is mapped to region rllA6 by its ability to form wild-type recombinants with Series | nonrevertible 
mutants that contain this region. The map location of the revertible mutant is more precisely mapped using Series Il mutants 
that show it forms wild-type recombinants with Series II mutants containing region llA6a2. 


Figure 6.25 Deletion mapping in the rll region. (a) Seven Series | partial-deletion mutants of the rll region 
and 25 Series Il partial-deletion mutants subdivide the rll region into 47 segments. (b) Deletion-mapping 
analysis of an rl/A point (revertible) mutant to region rllA6a2 by its ability to form wild-type recombinants (+) 
and its inability to form wild-type recombinants (—) with partial-deletion mutants of Series | and Series Il. 
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rllA. The A6 region is subdivided into four segments (A6a 
to Aéd). The three partial-deletion mutants of Series II are 
then selected for the final step in the mapping. In the Series 
II analysis, we see that the point mutant does not form 
wild-type recombinants with PB230 and P18 but is able to 
do so with 164, The smallest interval that is missing from 
PB230 and P18 but present in 164 is the a2 region of rIIA6. 
This point mutation therefore maps to rl[A6a2. 


6.7 Lateral Gene Transfer Alters 
Genomes 


The genetic maps created by analysis of data from conjuga- 
tion, transduction, and transformation experiments were 
extraordinarily important for understanding the content 
and organization of bacterial genomes. Contemporaneous 
with the identification of DNA structure (the early 1950s) 
and with descriptions of the molecular basis of DNA rep- 
lication, transcription, and translation (the late 1950s and 
early 1960s), these genetic maps served as the foundation 
for DNA-sequence—based maps of bacterial and archaeal 
genomes that have been produced by the thousands since 
the late 1990s. The earlier genetic maps gave a precise 
outline of the order and relative positions of most genes 
in commonly investigated genomes such as that of E. coli, 
and they made it possible to jump-start the process of 
identifying the functions of genes in bacterial and ar- 
chaeal genomes, a process known as annotation. Chapter 
18 contains a detailed discussion of genome sequencing 
strategies, genome structures, evolutionary genomics, and 
genome annotation. Here we provide a brief overview of 
lateral gene transfer that has contributed substantially to 
the content of many genomes. 


Lateral Gene Transfer and Genome Evolution 


Lateral gene transfer (LGT), also known as horizontal 
gene transfer (HGT), is the transfer of genetic mate- 
rial between individual bacteria or archaea and other 
organisms. The participating organisms are sometimes 
members of the same species, but they can also be 
members of different species or even distinct taxonomic 
groups. Common examples of LGT are the three bacterial 
transfer processes discussed in this chapter: conjugation, 
transformation, and transduction. Each of these pro- 
cesses occurs readily in and between species. Extensive 
studies of LGT across a wide range of bacterial and ar- 
chaeal species find that on average more than 12% of the 
genes in a genome are the result of LGT. The range in the 
amount acquired by LGT is quite wide, from a high of 
more than 25% in the genome of the archaeal organism 
Methanosarcina acetivorans to less than 2% of the ge- 
nome in the bacterium Mycoplasma genitalium. E. coli is 
relatively high on the LGT percentage-transfer list, with 
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about 17% of the genome transferred by LGT. Studies 
of LGT detect a substantial bias in the biological func- 
tion of laterally transferred genes. Genes whose protein 
products are expressed at the cell surface, genes encoding 
DNA-binding proteins, and genes whose products have 
pathogenicity-related functions are much more likely to 
undergo LGT. 

LGT between bacteria is prevalent, but in addition, 
there has long been evidence of limited LGT between bac- 
teria and eukaryotes. Prior to the availability of genome se- 
quence information, LGT between bacteria and eukaryotes 
was thought to be limited to the transfer of a very small 
number of genes. From an evolutionary perspective, the 
most prominent example of bacteria—eukaryote LGT is 
the presence of mitochondria in plant and animal cells and 
the presence of chloroplasts in plant cells. Mitochondria 
and chloroplasts are essential organelles in eukaryotic cells. 
Millennia ago, ancient bacteria invaded ancient eukary- 
otic cells and, through a process of coevolution on the 
part of both cells, mitochondria and chloroplasts estab- 
lished endosymbiotic relationships with eukaryotic cells. 
Both organelles carry their own chromosomes that contain 
unique genetic information. Mitochondrial gene products 
work with nuclear gene products to produce adenosine 
triphosphate (ATP) in animal cells, and chloroplast gene 
products are responsible for photosynthesis in plant cells. 
The inheritance of mitochondrial and chloroplast genes 
differs from that of nuclear genes because the organelles 
are cytoplasmic, not nuclear. We discuss the details of cy- 
toplasmic heredity and the evolution of mitochondria and 
chloroplasts in Chapter 19. 

A second well-known example of bacteria—eukaryote 
LGT is the transfer of DNA from the bacterium 
Agrobacterium tumefaciens to plants. Agrobacterium 
transfers about 10,000 to 30,000 base pairs of DNA from 
its much larger tumor-inducing (Ti) plasmid to plant 
cells. In plants, this DNA causes crown gall disease, a 
type of cancerous tumor. The natural propensity of Ti 
plasmid to transfer into plant cells is utilized in the re- 
search laboratory in the production of transgenic plants, 
as we discuss in Chapter 17. 

In 2007, genome sequencing information demonstrated 
extensive LGT between the bacterium Wolbachia and a 
large number of insects. The data indicate that roughly one- 
third of all arthropod genomes contain Wolbachia DNA 
transferred by LGT. Researchers speculate that LGT be- 
tween bacteria and animals may be much more common 
than previously thought. Only some of the transferred genes 
appear to actually enter the germ line where they can be 
transmitted during reproduction. There is, however, recent 
speculation that DNA transferred by LGT from bacteria 
could become inserted into the genomes of somatic cells, 
where it could induce mutations. If such insertional muta- 
genesis were to occur, it could possibly cause abnormalities, 
including the development of cancer. More information will 
emerge about this topic in the near future. 
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Identifying Lateral Gene Transfer in Genomes 


LGT is identified by the presence of DNA-sequence fea- 
tures that make certain portions of a genome distinct 
from the rest of the genome. These distinctive genome 
regions are called genomic islands because they oc- 
cur within a confined portion of the genome. Genomic 
islands typically are large segments that span 10—200 kb 
and often include multiple genes that may have related 
functions. Two common ways to identify a genomic is- 
land acquired by LGT are (1) by determining that a group 
of genes are much more similar to genes of a distantly 
related species than to those of a closely related species 
and (2) by detecting a region of genome that has a ratio 
of G-C base pairs to A-T base pairs that is substantially 
higher or lower than the average in the rest of the genome. 

Recent evidence points to a significant role for LGT 
in the evolution of genomes. Moreover, in two particular 
ways, some LGT-driven events are of profound medical 
importance to humans. First, LGT has allowed many 
organisms to adapt rapidly to changing environmental 
conditions by acquiring the ability to resist one or more 
antibiotic compounds. The capacity to resist the effects 
of antibiotics can allow drug-resistant bacteria to pro- 
liferate in the presence of the antibiotics. LGT within 
and between bacterial species is a common route for the 
rapid dissemination of antibiotic resistance. 

Medical practitioners today routinely encounter pa- 
tients with infections produced by bacterial strains that 
are resistant to one or more of the commonly used 
anibiotics. The U.S. Centers for Disease Control and 
Prevention (CDC) issued a report in late 2013 highlight- 
ing the seriousness of antibiotic resistance as a prevalent 
medical problem. The report stated that each year in the 
United States more than 2 million people are infected 
with antibiotic-resistant bacteria and that the annual 
death rate from these infections is nearly 25,000. 

Antibiotic resistance is readily transferred among bac- 
teria by LGT, and the presence of resistance genes is 
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increased by the extensive use, and misuse, of antibiotics. 
The 2013 CDC report attributes a substantial portion of 
the increase in antibiotic-resistant strains to the pervasive 
use of antibiotics in animal agriculture where they are 
often used to promote growth in animals with no signs of 
infection. These circumstances and the impact of this phe- 
nomenon on the practice of medicine are the subject of the 
Case Study in this chapter. 

The second medically-relevant consequence of LGT 
in bacteria is the acquisition of pathogenicity islands, 
a subtype of genomic islands, containing multiple genes 
producing proteins that promote the ability of the bacte- 
ria to invade the body of a host and also containing genes 
that produce toxic compounds. 

The common, and usually friendly, intestinal bacte- 
rium E. coli exists in a number of different strains, some of 
which are pathogenic. The most common strains of E. coli 
are commensal bacteria that inhabit our intestinal tract 
and provide benefits without doing harm. Some strains, 
however, have acquired pathogenicity islands and cause 
illnesses such as diarrhea and meningitis. The recently 
identified pathogenic strain of E. coli O157:H7 contains 
a pathogenicity island acquired by transduction. E. coli 
O157:H7 is found in some contaminated beef and on 
some fresh produce, including lettuce. Thorough rinsing 
can, but does not always, remove the pathogen from let- 
tuce, and undercooking contaminated beef does not raise 
its temperature high enough to kill pathogens that may 
be present. The pathogenicity island in E. coli 0157:H7 
contains genes that promote the adhesion of the patho- 
gen to intestinal cells and a toxin gene that acts similarly, 
although not as dramatically, as the Vibrio cholera toxin. 
Infection with E. coli O157:H7 produces diarrhea that 
can be severe in immune-compromised individuals or in 
infants and the elderly. The island also contains a gene 
producing a toxin that blocks translation in cells. This 
toxin particularly affects kidney and intestinal cells and 
contributes to bloody diarrhea. 


The Evolution of Antibiotic Resistance and Change in Medical Practice 


Alexander Fleming got a little sloppy with his sterile 
technique one day in 1929 and made a mistake that has 
since saved millions of lives. Fleming was working with 
Staphylococcus, a common bacterial strain that causes a 
serious and potentially fatal “staph” infection when it en- 
ters the body through a cut or abrasion. On the fateful day, 
Fleming unknowingly contaminated his Staphylococcus cul- 
ture with a fungus. 

Normally, fungal cells reproduce in culture along with 
bacterial cells and are noticed when the culture is spread 
on plates. Fleming’s contaminating fungus was different, 
however, because when Fleming spread his contaminated 
culture on plates, only fungal colonies grew—there were no 


bacterial colonies! The fungus had killed the bacterial cells in 
the culture. Recognizing this as an important, if inadvertent, 
discovery, Fleming quickly identified the fungus as Penicillium 
and gave the compound that killed Staphylococcus the name 
penicillin. 

In the 1930s, Howard Florey showed that penicillin was 
an effective antibiotic against a broad spectrum of infectious 
bacteria. At the beginning of World War Il, Florey directed a 
major “scale-up” project to put penicillin into mass produc- 
tion. Penicillin proved tremendously effective at preventing 
what otherwise might have been fatal bacterial infections. 

Today, although penicillin and other antibiotics con- 
tinue to save lives, antibiotic-resistant strains of bacteria are 


increasingly the cause of difficult-to-treat infections and even 
death. This is quickly becoming an acute problem in modern med- 
icine. For example, at present more than 95% of Staphylococcus 
strains found in hospitals are resistant to penicillin, and some 
strains carry resistance alleles to multiple antibiotics. Examples 
include methicillin-resistant Staphylococcus aureus (MRSA) and 
other infectious organisms that have acquired resistance to 
multiple antibiotic compounds. Antibiotic resistance is a rapidly 
growing problem that has already changed practices in medical 
treatment of infectious disease. The future holds more changes, 
both in patient treatment and the broader use of antibiotics. 
What happened to bring about this shift? The answer has 
two parts. One component we have already mentioned—the 
evolution of antibiotic resistance and the acquisition of patho- 
genicity by bacteria through lateral gene transfer. Antibiotic re- 
sistance can be readily transferred within a species and between 
bacterial species by conjugation, transduction or transformation. 
The second factor is the use and misuse of antibiotics them- 
selves that establishes an environment in which resistant strains 
proliferate at the expense of sensitive strains. Exposing bacteria 
to antibiotics generally leads to killing antibiotic-sensitive bacteria 
and can allow the survival of antibiotic-resistant bacteria. Even 
when they are properly used, antibiotics can act as an agent of 
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6.1 Bacteria Transfer Genes by Conjugation 


| Bacteria transfer genetic material in a unidirectional 
process (donor cell to recipient cell) called conjugation. 
Experimental analysis determined that conjugation requires 
direct contact between donor and recipient. 

Conjugation is controlled by genes on a plasmid known as an 
F factor. Donor bacteria that carry an extrachromosomal F 
factor are F” cells, and bacteria without an F factor are F7, or 
recipient, cells. 

| F factor transfer begins with the binding of a relaxosome 
protein complex at the transfer origin (oriT) and cleavage 
of one strand of F factor DNA, the T strand. Rolling circle 
DNA replication transfers the F factor from the donor cell to 
the recipient cell across a conjugation pilus. 

Conjugation between an F* donor and an F” recipient trans- 
fers the F factor only. The F~ cell is converted to an F* cell 
but receives no genetic material from the donor bacterial 
chromosome. 


F factor integration into the donor chromosome takes 
place by recombination at insertion sequences (IS) found 
in both the F factor and the donor chromosome. F factor 
integration creates an Hfr (high-frequency recombination) 
chromosome. 

E Many different kinds of Hfr chromosomes can occur ina 
single bacterial species. Each Hfr has a particular orientation 
and site of integration. 

Conjugation between an Hfr donor and an F recipient 
transfers a portion of the F factor and a segment of donor 
DNA. The donor segment undergoes homologous recom- 
bination with the recipient chromosome. Exconjugants 
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artificial selection that facilitates the survival of resistant strains 
at the expense of sensitive strains. When antibiotics are misused, 
such as when they are used pervasively in animal agriculture to 
increase growth even though no infection is present, are not 
taken for the prescribed period of time by a patient, and are used 
to treat non-bacterial infections, they eliminate great numbers 
of antibiotic-sensitive bacteria and promote the proliferation of 
resistant bacteria. 

Resistance and sensitivity to antibiotics are not absolute 
characteristics. A “resistant” strain is just that—resistant to an 
antibiotic but not necessarily impervious to it. It takes more 
antibiotic to kill a resistant strain than to kill a sensitive strain. 
With regard to treating an infected person or animal, how- 
ever, the medical question is: At what dosage is the benefit of 
the antibiotic outweighed by the harm to the patient? 

At present, and increasingly in the future, physicians will 
have to be acutely aware of the events and behaviors that can 
lead to bacterial infection, be hypervigilant in spotting potential 
infections by resistant strains, and be prepared to quickly adapt 
medical treatments and protocols to manage resistant strains of 
bacteria. Future physicians must understand how and why anti- 
biotic resistance has evolved if they are going to be successful in 
dealing with its ramifications for their patients. 


For activities, animations, and review quizzes, go to the Study Area. 


receive donor bacterial genes but are not converted to a 
donor state. 


6.2 Interrupted Mating Analysis Produces 
Time-of-Entry Maps 
Time-of-entry maps are created for each Hfr strain by 
interrupted mating studies that identify the order of entry 


of donor genes and determine the distance (in minutes) 
between transferred genes. 


Hfr maps for a given bacterium are consolidated to form a 
genetic map of the donor chromosome as a whole. 


6.3 Conjugation with F’ Strains Produces 
Partial Diploids 


I F’ donor strains are created when excision of an F factor 
from Hfr integration removes F factor DNA along with 
adjacent donor chromosome DNA. 


Conjugation between an F’ donor and an F recipient 
generates partial diploidy in exconjugants. 


6.4 Bacterial Transformation Produces 
Genetic Recombination 


Extracellular fragments of DNA released when a donor 
bacterial cell lyses can be absorbed across the cell mem- 
brane of a competent recipient cell as transforming DNA. 
Transforming DNA undergoes homologous recombination 
with the recipient chromosome to produce transformants 
that have acquired donor DNA. 
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6.6 Bacteriophage Chromosomes Are Mapped 
by Fine-Structure Analysis 


6.5 Bacterial Transduction Is Mediated 
by Bacteriophages 


Bacteriophage infection of a host bacterial cell can lead to 
lysis of the host cell. 

Temperate bacteriophages can undergo site-specific integra- 
tion into the host chromosome by lysogeny. 

Generalized transducing phages are created when a phage 
particle mistakenly packages a segment of a bacterial chro- 
mosome during lysis of the host cell. 

Recipient cells undergo generalized transduction when do- 
nor DNA introduced by a generalized transducing phage re- 
combines with the recipient chromosome. Any donor genes 
can be transduced during generalized transduction. 


| Seymour Benzer used genetic complementation analysis to 
determine that two genes make up the r// region controlling 
T4 bacteriophage lysis of E. coli. 

f Analysis of intragenic recombination, and deletion mapping 
of more than 1600 r//A and rIIB mutants, led to the con- 
clusion that DNA nucleotides are the fundamental unit of 
recombination. 


6.7 Lateral Gene Transfer Alters Genomes 


| LGT is common within species and among diverse species. 


f Cotransduction mapping determines the order of genes on 


the donor chromosome. 


E Specialized transducing phages are produced by the aber- 
rant excision of a lysogenic prophage that removes a por- 
tion of the prophage and an adjacent segment of host DNA. 
Specialized transduction is limited to transduction of genes 


| LGT usually involves multiple genes in genomic islands. 


E Bacteria commonly acquire pathogenicity and antibiotic 


thought. 


adjacent to the site of prophage integration. 
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d. Which of these donors can transfer a donor gene to 
exconjugants? 

e. Describe the results of conjugation (i.e., changes in the 
recipient and the exconjugant) that allow detection of 
the state of the F factor in a donor strain. 

f. Describe a “partial diploid” and how it originates. 


1. For bacteria that are F*, Hfr, F’, and F7, perform or answer 
the following. 
a. Describe the state of the F factor. 
b. Which of these cells are donors? Which is the recipient? 
c. Which of these donors can convert exconjugants to a 
donor state? 


The flow diagram shown below identifies possible relation- 
ships between bacterial strains in various F factor states. 
For each of the four links in the diagram, provide a descrip- 
tion of the events involved in the transition. 


1 2 4 
F` > Ft —> Hfr —>F' 
— 

3 


Conjugation between an Hfr cell and an F` cell does not 
usually result in conversion of exconjugants to the donor 
state. Occasionally however, the result of this conjugation 
is two Hfr cells. Explain how this occurs. 


Bacteria transfer genes by conjugation, transduction, and 
transformation. Compare and contrast these mechanisms. 
In your answer, identify which if any processes involve ho- 
mologous recombination and which if any do not. 


Explain the importance of the following features in conju- 
gating donor bacteria: 

a. the origin of transfer b. the conjugation pilus 
c. homologous recombination d. the relaxosome 

e. relaxase f. T strand DNA 

g. pilin protein 
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12. 


13. 


14. 


15. 


What is lateral gene transfer? How might it take place 
between two bacterial cells? 


Lateral gene transfer is thought to have played a major role 
in the evolution of bacterial genomes. Describe the impact 
of LGT on bacterial genome evolution. 


Seven deletion mutations (1 to 7 in the table below) 

are tested for their ability to form wild-type recombinants 
with five point mutations (a to e). The symbol “+” 
indicates that wild-type recombination occurs, and “—” 
indicates that wild types are not formed. Use the data 

to construct a genetic map of the order of point muta- 
tions, and indicate the segment deleted by each deletion 
mutation. 


Deletion Mutation 


Point 1 2 3 4 5 6 7 
Mutation 


a 
b 
G t 
d 


= t 


An rl lysis mutation caused by a point mutation is tested 
against several deletion mutations shown in Figure 6.25 for 
its ability to form wild-type recombinants. The deletion 
mutants are divided into two groups, Series I and Series II. 
In the “result” column of the table below, “+” indicates the 
formation of wild-type recombinants and “—” indicates that 
wild types do not form. In the first part of your answer, 

use the Series I data exclusively to identify the segment 


10. 


11. 


Problems 223 


Describe the difference between the bacteriophage lytic 
cycle and lysogenic cycle. 


Describe what is meant by the term site-specific recombi- 
nation as used in identifying the processes that lead to the 
integration of temperate bacteriophages into host bacterial 
chromosomes during lysogeny or to the formation of spe- 
cialized transducing phage. 


What is a prophage, and how is a prophage formed? 


How is the frequency of cotransduction related to the rela- 
tive positions of genes on a bacterial chromosome? Draw a 
map of three genes and describe the expected relationship 
of cotransduction frequencies to the map. 


Describe the differences between genetic complementation 
and recombination as they relate to the detection of wild- 
type lysis by a mutant bacteriophage. 


Among the mechanisms of gene transfer in bacteria, which 
one is capable of transferring the largest chromosome seg- 
ment from donor to recipient? Which process generally 
transfers the smallest donor segments to the recipient? 
Explain your reasoning for both answers. 


For answers to even-numbered problems, see Appendix: Answers. 


of the rl region containing the lysis mutant tested. In the 
second part of your answer, use the Series II data to refine 
the point mutation location. Explain your rationale for 
mutation location assignments for both the Series I and the 
Series II data. 


Series | Series Il 
Deletion Deletion 
Mutation Result Mutation Result 
1272 = 1364 F 
1241 = EM66 = 
J3 = 386 T 
PTI J 168 m 
PB242 F 1993 = 
A105 T 1695 z 
638 aF PT153 dF 
1231 = 
C33 ate 
250 z 


16. Suppose you have an r// lysis mutant that maps to seg- 


17. 


ment A2h2. Use the Series I and Series II deletion mutants 
identified in the problem above, and fill out the “results” 
columns with the “+” and “—” designations expected for the 
A2h2 mutant. 


Five Hfr strains from the same bacterial species are ana- 
lyzed for their ability to transfer genes to F` recipient 
bacteria. The data shown below list the origin of transfer 
(oriT) for each strain and give the order of genes, with the 
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19. 
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first gene on the left and the last gene on the right. Use the 
data to construct a circular map of the bacterium. 


Hfr Strain Genes Transferred 
Hfr 1 oriT met ala lac gal 
Hfr 2 oriT met leu thr azi 
Hfr 3 oriT gal pro trp azi 3 
Hfr 4 oriT leu met ala lac 
Hfr 5 oriT trp azi thr leu met 


An interrupted mating study is carried out on Hfr strains 1, 
2, and 3 identified in the problem above. After conjugation 
is established, a small sample of the mixture is collected 
every minute for 20 minutes to determine the distance 
between genes on the chromosome. Results for each of the 
three Hfr strains are shown below. The total duration of 
conjugation (in minutes) is given for each transferred gene. 


Hfr strain 1 oriT met ala lac gal 
Duration (min) 0 2 8 133 17 
Hfr strain 2 E oriT Z met leu thr azi 
Duration (min) 0 2 7 10 17 
Hfr strain 3 oriT gal pro trp azi 
Duration (min) 0 3 8 14 19 


a. For each Hfr strain, draw a time-of-entry profile like the 
one in Figure 6.8a. 

b. Using the chromosome map you prepared in answer to 
Problem 15, determine the distance in minutes between 
each gene on the map. 

c. Explain why azi is the last gene of strain 2 to transfer in 
the 20 minutes of conjugation time. How many minutes 
of conjugation time would be needed to allow the next 
gene on the map to transfer from Hfr strain 2? 

d. Write out the interrupted mating results you would ex- 
pect after 20 minutes of conjugation for Hfr strains 4 and 
5. Use the format shown at the beginning of this problem. 

e. In minutes, what is the total length of the chromosome 
in the donor species? 


An Hfr strain with the genotype cys* leu* met" str® is 

mated with an F` strain carrying the genotype cys leu 

met” str”. In an interrupted mating experiment, small 

samples of the conjugating bacteria are withdrawn every 

3 minutes for 30 minutes. The withdrawn cells are shaken 

vigorously to stop conjugation and then placed on three 

different selection media, composed as follows: 

Medium 1: Minimal medium plus leucine, methionine, 

and streptomycin 

Medium 2: Minimal medium plus cysteine, methionine, 

and streptomycin 

Medium 3: Minimal medium plus cysteine, leucine, and 

streptomycin 

a. What donor gene is the selected marker in each medium? 

b. List all possible bacterial genotypes growing on each 
medium. 

c. What is the purpose of adding streptomycin to each 
selection medium? 


20. 


21. 


The table on next page shows the number of colonies 
growing on each selection medium. The sampling time 
indicates how many minutes have passed since conjugation 
began. 


Sampling Time 


(minutes) Number of Colonies 
Plate 1 Plate 2 Plate 3 
3 0 0 0 
6 0 0 0 
9 0 62 0 
12 0 87 0 
15 51 124 0 
18 79 210 62 
21 109 250 85 
24 144 250 1a 
27 152 250 122 
30 152 250 122 


d. Determine the order of donor genes cys, leu, and met 
from the interrupted mating data. 

e. Suppose a fourth selection medium containing leucine 
and streptomycin is prepared. At what sampling time 
do you expect the first-growing colonies to appear? 
Explain your reasoning. 


A triple-auxotrophic strain of E. coli having the genotype 
phe met ara’ is used as a recipient strain in a transduc- 
tion experiment. The strain is unable to synthesize its own 
phenylalanine or methionine, and it carries a mutation that 
leaves it unable to utilize the sugar arabinose for growth. 
The recipient is crossed to a prototrophic strain with the 
genotype phe’ met’ ara’. The table below shows the se- 
lected marker and gives cotransduction frequencies for the 
unselected markers. 


Selected Colonies Containing 


Selected Marker the Unselected Marker (%) 


phe me ara 

met* 4 - 7 

phe* - 2 51 

met’, phet — — 79 

ara 68 5 = 


a. Identify the compounds present in each of the selective 
media. 

b. Use the cotransduction data to determine the order of 
these genes. 


Penicillin was first used in the 1940s to treat 

gonorrhea infections produced by the bacterium 

Neisseria gonorrhoeae. According to the CDC, in 1984, fewer 
than 1% of gonorrhea infections was caused by 
penicillin-resistant N. gonorrhoeae. By 1990, more than 10% 
of cases were penicillin-resistant, and a few years later the 
level of resistance was at greater than 95%. Almost every 
year the CDC issues new treatment guidelines for gonorrhea 
that identify the recommended antibiotic drugs and dosages. 


22. 


23. 


a. Why is the CDC so active in making these 
recommendations? 

b. What are the short-term implications of these frequent 
changes for physicians and clinics that treat sexually 
transmitted diseases like gonorrhea and for individuals 
infected with gonorrhea? 

c. What are the long-term implications of these frequent 
changes in treatment recommendations for the patient 
population? 


An attribute of growth behavior of eight bacteriophage 
mutants (1 to 8) is investigated in experiments that estab- 
lish coinfection by pairs of mutants. The experiments de- 
termine whether the mutants complement one another (+) 
or fail to complement (—). These eight mutants are known 
to result from point mutation. The results of the comple- 
mentation tests are shown below. 


Mutations 
1 2 3 4 5 6 7 8 


wo/nialujalwirn]o 


a. How many genes are represented by these mutations? 

b. Identify the mutants of each gene. 

c. In each coinfection above that is identified as a fail- 
ure to complement (—), researchers see evidence of 
recombination producing wild-type growth. How do 
the researchers distinguish between wild-type growth 
resulting from complementation and wild-type growth 
that is due to recombination? 

d. A new mutation, designated 9, fails to complement 
mutants 1, 3, 5, 7, and 8. Wild-type recombinants form 
between mutant 9 and mutations 3, 5, and 8; however, 
no wild-type recombinants form between mutant 9 and 
mutations 1 and 7. What kind of mutation is mutant 9? 
Explain your reasoning. 

e. New mutation 10 fails to complement mutants 1, 4, 5, 
6, 8, and 9. Mutant 10 forms wild-type recombinants 
with mutants 1, 5, and 6, but not with mutants 4 and 8. 
Mutant 9 and mutant 10 form wild-type recombinants. 
What kind of mutation is mutant 10? Explain your 
reasoning. 

f. Gene mapping information identifies mutations 2 
and 3 as the flanking markers in this group of genes. 
Assuming these mutations are on opposite ends of the 
gene map, determine the order of mutations in the 
region of the chromosome. 


Synthesis of the amino acid histidine is a multistep ana- 
bolic pathway that uses the products of 13 genes (hisA 

to hisM) in E. coli. Two independently isolated his” E. 
coli mutants, designated his? and his2 , are studied ina 


24. 


25, 
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conjugation experiment. A his* F' donor strain that carries 

a copy of the his] gene on the plasmid is mated with a his1~ 

recipient strain in experiment 1 and with a his2 recipient 

in experiment 2. The exconjugants are grown on plates 

lacking histidine. Growth is observed among the 

exconjugants of experiment 2 but not among those of 

experiment 1. 

a. Why is growth observed in experiment 2 but not in ex- 
periment 1? 

b. What is the genotype of exconjugants in experiment 2? 


The phage P1 is used as a generalized transducing phage in 
an experiment combining a donor strain of E. coli of geno- 
type leu* phe* ala* and a recipient strain that is leu~ phe” 
ala . In separate experiments, transductants are selected 
for leu* (experiment A), for phe* (experiment B), and for 
ala* (experiment C). Following selection, transductant 
genotypes for the unselected markers are identified. 
a. What compound or compounds are added to the mini- 
mal medium to select for transductants in experiments 
A, B, and C? 
Selection experiment results below show the frequency of 
each genotype. 


Experiment A Experiment B 

leu ala 65% leu phe” 71% 

leu*ala™ 48% leu" phe 21% 
leu” phet 0% 


leu~ al 0% 
leu'ala’ 4%  leu*phet 3% 


Experiment C 
phe ala 
“phe* ala~ 50% 
phe~ al 19% 
phe” al 3% 


b. Determine the order of genes on the donor 
chromosome. 

c. Diagram the crossover events that form each of the 
transductants in experiment A. 

d. In experiment B, why are there no transductants with 
the genotype leu” ala‘? 


A series of seven point mutations are mapped along the 
rIIA gene and then tested for their ability to form wild-type 
recombinants with rI partial-deletion mutants. In the 
table, “+” indicates the formation of wild-type recombi- 
nants, and “~” indicates that wild types do not form. Use 
the data to show the length and endpoints of each deletion 


as accurately as you can. 


37 46 21 19 34 27 12 
Mutant map: L__JL_L_JL_JL_IL__] 


rllA point mutants 


Point Mutants 
19 21 


Deletion 


Mutants 12 27 34 37 46 


26. Five rI partial-deletion mutants are mapped and then 


tested for their ability to form wild-type recombinants 
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with six point mutants. The extent and endpoints of 
deletion mutants are shown below the rI region of the 
chromosome. 


a. Use the data in Table A to place each point mutation as 
precisely as you can along the chromosome. 


Table A 
Deletion Mutants 

Point Mutants C19 L36 M12 R22 W42 
55 H H 

67 
74 
82 
85 
91 


b. Use the complementation data in Table B to determine 
where the division between r//A and rIIB is located on 
the rI region. 


Table B 
Complemented by 

Deletion Mutant rlIA rllB 
E19 + = 
L36 = == 
M12 T = 
R22 = 

W42 = 
rll region 


Deletion mutations 


M12 i 

C19 Le 

W42 =i 
L36 mooo 


R22 Ooo 


27. 


28. 


c. Based on the data and on your analysis, draw a comple- 
mentation table for the five point mutants 55, 67, 74, 82, 
and 85. (Skip mutant 91 for this problem.) 

d. Add mutant 91 to your complementation table (assume 
it maps to r/JA). 


A 2013 CDC report identified the practice of 

routinely adding antibiotic compounds to animal feed 

as a major culprit in the rapid increase in the number of 
antibiotic-resistant strains. Agricultural practice in 
recent decades has encouraged the addition of antibiotics 
to the animal feed to promote growth rather than to treat 
disease. 


a. Speculate about the process by which feeding 
antibiotics to animals such as cattle might lead to 
an increase in the number of antibiotic-resistant strains 
of bacteria. 

b. How might the increase in antibiotic-resistant strains of 
bacteria in cattle be a threat to human health? 


Hfr strains that differ in integrated F factor orientation and 
site of integration are used to construct consolidated bacte- 
rial chromosome maps. The data below show the order of 
gene transfer for five strains. 


Hfr Strain Order of Gene Transfer (first — last) 


Hfr A oriT — thr—leu—azi—ton—pro—lac—ade 
HfrB  oriT-mtl—xyl-mal—str—his _ 

HfrC  oriT—ile—met—thi— thr— leu— azi— ton 
Hfr D oriT — his — trp — gal— ade — lac — pro — ton 
HfrE  oriT—thi—met—ile—mtl—xyl-mal— str 


a. Identify the overlaps between Hfr strains. Identify the 
orientations of F factors relative to one another. 

b. Draw a consolidated map of the bacterial chromosome. 
(Hint: Begin by placing the insertion site for Hfr A at 
the 2 o'clock position and arranging the genes thr- 
leu-azi- . . . in clockwise order.) 


DNA Structure 
and Replication 


The laboratory method known as polymerase chain reaction (PCR) is made 
possible by Taq polymerase that was first isolated from Thermus aquaticus 
bacteria living in near-boiling conditions in Yellowstone National Park. 
The inset photo (upper left) shows growing T. aquaticus. 


are central dogma of biology identifies DNA as the 
repository of genomic information for organisms and 
describes its central role in the production of RNA transcripts 

of genes and of polypeptides produced by translation of 
mRNA (see Figure 1.8, p. 10). DNA’s ongoing role in these | 
processes requires its faithful replication in each cell cycle, 

and that is the subject of this chapter. 

In Chapter 1, we reviewed the primary and secondary 
structures of DNA and RNA and the fundamentals of DNA 
replication. In this chapter, we discuss the structure of DNA 
in greater detail and extend the earlier description to include 
the molecular processes occurring in DNA replication. We also 
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examine two analytical methodologies—polymerase 
chain reaction (PCR) and DNA sequencing—that 
were developed as an outcome of the understand- 
ing of replication. The Case Study at the end of the 
chapter describes the use of PCR and DNA sequenc- 
ing to identify and analyze the mutation associated 
with Huntington disease (OMIM 143100), an autoso- 
mal dominant disorder in humans. 


7.1 DNA Is the Hereditary 
Molecule of Life 


When scientists speak of the “hereditary molecule” of a 
species, they mean the molecular substance that carries 
and conveys the species’ genetic information. Our con- 
temporary understanding of hereditary transmission and 
the evolution of species is rooted in the knowledge that 
DNA is the hereditary molecule of all organisms. Long 
before the hereditary role of DNA was established, how- 
ever, research had identified five essential characteristics 
of hereditary material. The hereditary material must be 


1. Localized to the nucleus and a component of 
chromosomes 


Present in a stable form in cells 


Sufficiently complex to contain the genetic infor- 
mation required to direct the structure, function, 
development, and reproduction of organisms 


4, Able to accurately replicate itself so that daughter 
cells contain the same information as parental cells 


5. Mutable, undergoing mutation at a low rate that in- 
troduces genetic variation and serves as a foundation 
for evolutionary change 


Chromosomes Contain DNA 


The weakly acidic substance known today as DNA was 
first noticed in 1869, when Friedrich Miescher isolated it 
from the nuclei of white blood cells in a mixture of nucleic 
acids and proteins he called “nuclein.” Miescher made 
little progress in determining the composition of nuclein, 
however, and the substance was little studied over the 
next several decades. 

In the 1870s, microscopic studies identified the fusion 
of male and female nuclei during reproduction. Shortly 
thereafter, chromosomes were observed in cell nuclei. 
This was followed by the observation that the nuclei of 
different species contain different numbers of chromo- 
somes, as well as by descriptions of the equal chromosome 
contributions of males and females to reproduction. The 


earliest suggestion that DNA was the hereditary material 
was based on these tantalizing bits of information. It came 
from Edmund Wilson in 1895. After accurately document- 
ing that sperm and egg cells contribute the same number 
of chromosomes during reproduction, Wilson speculated, 


The precise equivalence of the chromosomes con- 
tributed by the sexes is a physical correlative of the 
fact that the two sexes play, on the whole, equal parts 
in hereditary transmission, and it seems to show that 
the chromosomal substance, the chromatin, is to be 
regarded as the physical basis of inheritance. Now, 
chromatin is known to be closely similar to, if not 
identical with a substance known as nuclein 

(C29 Hag No P3 Oy9, according to Miescher), which 
analysis shows to be a tolerably definite chemical 
composed of nucleic acid (a complex organic acid rich 
in phosphorus) and albumin. And thus we reach the 
remarkable conclusion that inheritance may, perhaps, 
be effected by the physical transmission of a particu- 
lar chemical compound from parent to offspring. 


In 1900, Mendel’s hereditary principles were rediscov- 
ered, and their predictions were widely disseminated in bi- 
ology (see Section 1.1). Shortly thereafter, in 1903, Wilson’s 
student Walter Sutton and, independently, Theodor Boveri 
accurately described the parallels between homologous 
chromosome and sister-chromatid separation during mei- 
otic cell division and the inheritance of genes. 

Over the next 20 years, the nucleus and chromo- 
somes were a focus of biological investigations of hered- 
ity. By 1920, the principal constituent of nuclein was 
identified as DNA, and the basic chemistry of DNA was 
deciphered. The molecule was determined to be a poly- 
nucleotide consisting of four repeating subunits—the four 
DNA nucleotides—held together by covalent bonds. The 
four DNA nucleotides are adenine (A), thymine (T), cyto- 
sine (C), and guanine (G). 

In 1923, DNA was localized to chromosomes. This 
discovery made DNA a candidate for the hereditary mate- 
rial, but DNA is not the sole constituent of chromosomes. 
Proteins are in high concentration in chromosomes; RNA 
is present in the nucleus and around chromosomes; and 
other compounds, including lipids and carbohydrates, 
were also considered as potential candidates for the heredi- 
tary material at one time or another. In fact, some early re- 
searchers, including, eventually, Edmund Wilson, thought 
protein was potentially a better candidate for the heredi- 
tary material than DNA. They noted that protein is com- 
posed of 20 different amino acids, whereas DNA has only 
4 kinds of nucleotides. The protein proponents suggested 
that the “20-letter alphabet” of protein could contain more 
information than the “4-letter alphabet” of DNA. It was 
against this backdrop that the results of three experiments 
conducted between 1928 and 1952 combined to identify 
DNA—not RNA, protein, or another chemical constituent 
of cells—as the hereditary material of organisms. 


A Transformation Factor Responsible 
for Heredity 


Frederick Griffith, a British physician with an interest in 
epidemiology, studied pneumonia infection in mice and 
published a lengthy research report in 1928 describing his 
findings. Modern biology focuses on just the few pages of 
Griffith’s long report that provided indirect evidence that 
DNA is the molecule responsible for conveying hereditary 
characteristics in bacteria. 

Griffith studied strains of the bacterium 
Pneumococcus, which causes fatal pneumonia in mice. He 
found that strains of the bacterium that cause pneumonia 
in mice grow in colonies that have a smooth (S) appear- 
ance, whereas those Pueumococcus strains that do not 
cause disease are identifiable by their rough (R) appear- 
ance (Figure 7.1). It was later determined that rough bacte- 
rial strains have a mutant allele of the polysaccharide gene, 
which results in a weakened and easily broken capsule. 
This single gene mutation thus leaves R bacteria vulner- 
able to attack by mouse immune system antibodies. 

The S and R forms of Pueumococcus occur in four 
antigenic types of the bacteria, identified as I, I, HI, and 
IV. Each antigenic type elicits a different immune re- 
sponse from the mouse immune system as a result of the 
presence of several genetic differences. A single mutation 
of the polysaccharide gene can convert an S strain to an R 
strain of the same antigenic type—for example, convert- 
ing an SII strain to an RII strain—but the antigenic type 
cannot be changed by a single mutation. In other words, 
mutation alone cannot change RII bacteria into SIII. 
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Figure 7.1 Appearance of smooth versus rough colonies of 
Pneumococcus. 


Griffith’s most important observations are derived 
from four injection tests he performed using S and R 
bacterial strains of different antigenic types (Figure 7.2). 
Following each injection test, he was able to draw blood 
from injected mice and culture the blood to identify 
the type of bacterium growing, if any, in the mouse. 
Griffith’s first three injection results show that @ inject- 
ing mice with S-strain bacteria produces illness and 
death, @injection of “heat-killed” S-strain bacteria (the 
bacteria are killed using high heat and pressure) does 
not induce illness, and © injection of an R strain does 
not produce illness. Griffith’s most significant result 
@ came when he injected a mixture of heat-killed SII 
strain and living RII strain. He found that most of the 


© 


Heat-killed type SIII 
and living type RII 


Figure 7.2 Frederick Griffith's 
experiment identifying a “trans- 
formation factor” responsible 
for heredity. @ Injection of 
living SIII bacteria kills mice. 

© Heat-killed SIII do not kill mice, 
nor do living RII bacteria ©. 

/ © Coinjection of a mixture of 
heat-killed SIII and living RII 
bacteria results in mouse death 
by SIII infection. 


Live type SIII 
bacteria recovered 


Conclusion: Hereditary 
molecule transformed RII 
bacteria into SIII bacteria. 
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mice became ill and died from pneumonia. His tests of 
blood cultures from the dead mice revealed living SIII 
bacteria. Knowing that this outcome could not have been 
the result of a simple mutational event, Griffith proposed 
that a molecular component he called the transformation 
factor was responsible for transforming RII into SIII. 

In Griffith’s proposal, the transforming factor was 
a molecule that carried hereditary information. He was 
unable to identify this molecule, but of course today we 
know it to be DNA. Today biologists also know that the 
process identified by Griffith is a naturally occurring pro- 
cess called transformation, which is used by bacteria to 
transfer DNA (see Section 6.4). 


DNA Is the Transformation Factor 


Shortly after Griffith published his report on the transforma- 
tion factor, Martin Dawson, working with Oswald Avery, 
developed an in vitro transformation procedure to mix liv- 
ing R cells with a purified extract of cellular material derived 
from heat-killed SIII cells containing the transformation fac- 
tor. Biochemical assays indicated that the SIII extract con- 
sisted mostly of DNA, along with a small amount of RNA 
and trace amounts of proteins, lipids, and polysaccharides. 
The most direct evidence that DNA was the trans- 
formation factor came from an experiment performed 


Figure 7.3 Avery, MacLeod, 
and McCarty’s use of in vitro 
transformation to identify DNA 
as the most likely hereditary 
molecule. A purified extract from 
heat-killed SIII bacteria successfully 
transforms RIl cells in the control 
experiment @. Destruction of 


by Avery and his colleagues Colin MacLeod and 
Maclyn McCarty in 1944 (Figure 7.3). This experi- 
ment identified the role of DNA in transformation by 
eliminating lipids, polysaccharides, protein, RNA, and 
DNA one at a time from the SIII extract. In each ex- 
perimental trial, the SIII extract was treated to remove 
one component at a time, and the treated extract was 
mixed with RII cells. The in vitro transformation reac- 
tion was allowed to take place, and the occurrence or 
prevention of transformation was assessed. 

Figure 7.3 shows that in vitro transformation takes 
place in the control experiment @, and when lipids and 
polysaccharides @, proteins ©, or RNA @ are removed 
from the extract. In contrast to the other results, ex- 
periment @, which uses DNase to specifically degrade 
DNA, does not result in transformation—a clear indica- 
tion that transformation is blocked by the destruction of 
DNA. Based on these observations, Avery, MacLeod, and 
McCarty correctly concluded that DNA is the transforma- 
tion factor and the probable hereditary material. 


DNA Is the Hereditary Molecule 


Avery, MacLeod, and McCarty’s work convinced many 
biologists that DNA was the long-sought hereditary mate- 
rial, and a great deal of research in the late 1940s and early 
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Conclusion: Transformation is 
not disrupted by the removal of 
lipids, polysaccharides, proteins, 
or RNA; therefore, none of these 
is the transformation factor. 


Conclusion: DNA is 
the hereditary 


molecule required 
for transformation. 


1950s was devoted to deducing the physical structure of 
DNA. Biologists realized that once the structure of DNA 
was known, the chemical nature of genes would be identi- 
fied, and biological research would move into the realm 
of genetic molecular biology. As clear and convincing as 
the work of Avery and his colleagues seems in retrospect, 
however, there were several unanswered questions about 
the role of DNA in heredity. There was also a need to 
demonstrate directly that the presence of a specific DNA 
molecule induces the appearance of a particular phenotype. 
That evidence came in a 1952 report by Alfred Hershey and 
Martha Chase, who showed that DNA, but not protein, is 
responsible for bacteriophage infection of bacterial cells. 
Bacteriophages, also known as phages, are viruses that 
infect bacteria. Phages such as T2, for example, consist of a 
protein shell with a tail segment that attaches to a host bacte- 
rial cell and a head segment that contains DNA. T2 phages 
are among the many bacteriophages that do not carry any 
RNA. Like other phages, T2 must infect host bacterial cells 


(1) Label phage DNA by 
growing phage in 
32P-containing medium. 
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in order to reproduce. Infection by a phage proceeds as il- 
lustrated in Figure 6.15 (p. 210) and culminates in the lysis 
of the host cell and the release of dozens of progeny phages. 
In their experiment, Hershey and Chase took advantage 
of an essential difference between the chemical composition 
of DNA and protein to confirm the hereditary role of DNA 
(Figure 7.4). Proteins contain large amounts of sulfur but 
almost no phosphorus; conversely, DNA contains a large 
amount of phosphorus but no sulfur. Hershey and Chase 
initially grew phage cultures in different growth media. 
One growth medium contained *°S, the radioactive form of 
sulfur, to label protein @; the other contained radioactive 
phosphorus, °P, to label DNA @. The researchers used 
radioactively labeled phages from each medium to infect 
unlabeled host bacterial cells in parallel experiments @ @. 
After a short time, each mixture was agitated in a blender 
to separate bacterial cells from the now empty phage shells. 
Such empty phage shells are called “ghosts” © ©. The rela- 
tively large bacterial cells were easily separated from the 


(1) Label phage protein by 
growing phage in 
3S-containing medium. 
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phage. 
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Conclusion: DNA is the hereditary molecule 
passed by the infecting phage into the host 
cell and inherited by the progeny phages. 
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Figure 7.4 Hershey-Chase experiment showing DNA to be the molecule in bacteriophages that 


causes lysis of infected bacterial cells. 
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ghosts by centrifugation. The heavier bacteria collect in a 
pellet at the bottom of the centrifuge tube, while the lighter 
ghosts remain suspended in the supernatant. Testing each 
fraction for radioactivity revealed that virtually all the 3P 
label was associated with newly infected bacterial cells and al- 
most none with ghost particles @. On the other hand, the 7°S 
label was found in the ghost-particle fraction, and only trace 
amounts were found associated with the bacterial pellet O. 
This result demonstrates that phage DNA, but not phage 
protein, is transferred to host bacterial cells and directs the 
synthesis of phage DNA and proteins, the assembly of prog- 
eny phage particles, and ultimately the lysis of infected cells. 
The experiment demonstrated that the transformation factor 
identified previously by Griffith was DNA; it also showed that 
Avery, MacLeod, and McCarty were correct in concluding 
that DNA is the hereditary material. 


7.2 The DNA Double Helix Consists 
of Two Complementary and 
Antiparallel Strands 


Watson and Crick’s model of the secondary structure of 
DNA indicates that in some respects, the molecule is a 
simple one (see Section 1.2). It is composed of four kinds 
of nucleotides that are joined by covalent phosphodies- 
ter bonds into polynucleotide chains. Two polynucleotide 
chains come together along their lengths to form a double 
helix, also called a DNA duplex. Complementary pairing and 
hydrogen bonding between the nucleotide base pairs join 
the two strands in the double helix. Yet for all its simplicity— 
being composed of just four types of nucleotides—DNA is a 
complex informational molecule that serves as a permanent 
repository of genetic information in cells, and it directs the 
production of RNA molecules that carry out actions in cells 
or carry information for protein assembly. These essential 
functions of DNA derive from its molecular structure. 


DNA Nucleotides 


A DNA nucleotide has three components: (1) a deoxyri- 
bose sugar, (2) one of four nitrogenous bases, and (3) up 
to three phosphate groups (Figure 7.5). Deoxyribose con- 
tains 5 carbons that are identified as 1’, 2’, 3’, 4’, and 5’. 
An oxygen atom connects the 1’ carbon to the 4’ to form 
a five-sided (pentose) ring, and the 5’ carbon projects 
outward from the 4’ carbon (and from the ring). A nitrog- 
enous (nucleotide) base is attached to the 1’ carbon by a 
covalent bond; a hydroxyl group (OH) is attached to the 
3’ carbon; and a single phosphate molecule, or a chain 
of phosphates up to three molecules long, is attached at 
the 5’ carbon. Deoxyribose has hydrogen atoms bound 
at the 2’ carbon instead of a hydroxyl (OH) group. This is 
the basis for naming the sugar deoxyribose. 

The four nitrogenous bases in DNA are of two struc- 
tural types—a single-ringed form called a pyrimidine, and 
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Figure 7.5 Components and structures of DNA nucleotide 
monophosphates. 


a double-ringed form called a purine. Cytosine (C) and 
thymine (T) are pyrimidines, and adenine (A) and gua- 
nine (G) are purines. DNA nucleotides that are part of a 
polynucleotide chain have one phosphate group at their 
5' carbon that forms the covalent phosphodiester bond 
with the adjacent nucleotide in the strand. Deoxyadenosine 
5'-monophosphate (dAMP) and deoxyguanosine 5'- 
monophosphate (dGMP) carry the purine bases adenine 
and guanine, and deoxycytidine 5’-monophosphate (dCMP) 
and deoxythymidine 5'-monophosphate (dTMP) carry the 
pyrimidine bases cytosine and thymine. Collectively, these 
are identified as the deoxynucleotide monophosphates 
(dNMPs), where N can refer to any of the four nucleotide 
bases. In contrast, free (reactive) DNA nucleotides that are 
not part of a polynucleotide chain carry a string of three 
phosphate groups at the 5’ carbon and are identified as 
dATP, dGTP, dCTP, and dTTP. Collectively, these are the 
deoxynucleotide triphosphates (dNTPs). 

Individual nucleotides are assembled into a poly- 
nucleotide chain by the enzyme DNA polymerase, which 
catalyzes the formation of a phosphodiester bond be- 
tween the 3’ hydroxyl group of one nucleotide and the 5’ 
phosphate group of an adjacent nucleotide (Figure 7.6). 
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In a reaction catalyzed by DNA polymerase, and using 
thymine on the template strand (right) as a guide, the 
activated 3'OH of the deoxycitidine in the growing 
strand (left) attacks the triphosphate group of the 
incoming dATP. 
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Figure 7.6 DNA strand elongation. (a) Nucleotides complementary to the template strand are added 
to the 3’ end of the new strand by DNA polymerase. (b) DNA nucleotide triphosphates are recruited by 
DNA polymerase, which uses catalytic action to remove two phosphates (the pyrophosphate group) and 
form a new phosphodiester bond. 
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GENETIC ANALYSIS 


PROBLEM A portion of one strand of a DNA duplex has the sequence 5’ - ACGACGCTA-3’. 
a. Identify the sequence and polarity of the other DNA strand. 


b. Identify the second nucleotide added if the sequence given 


is used as a template for DNA replication. BREAK IT DOWN: New DNA 
synthesis progresses 5’-to-3’ to elongate 
the newly synthesized strand (p. 234). 


Solution Strategies Solution Steps 


BREAK IT DOWN: DNA 
nucleotides in one strand of a duplex are 
complementary to those in the other, 
and the strands are antiparallel (p. 234). 


Evaluate 

1. Identify the topic this problem addresses, and 1. The question concerns a DNA sequence and requests an answer 
the nature of the required answer. giving the sequence and polarity of the complementary strand. 

2. Identify the critical information given in the 2. The sequence and polarity are given for a portion of one DNA strand. 
problem. 

Deduce 

3. Review the general structure of a DNA duplex 3. DNA is a double helix composed of single strands that contain 
and the complementarity of specific complementary base pairs (A pairs with T, and c with c). The comple- 
nucleotides. mentary strands are antiparallel (i.e., one strand is 5’ to 3’, and its 

complement is 3’ to 5’). 

Solve 

4. Identify the sequence of the complementary 4. The complementary sequence is TGCTGCGAT. 
strand. 

5. Give the polarity of the complementary strand. 5. The polarity of the complementary strand is 3’- TGCTGCGAT- 5’. 

6. Identify the second nucleotide added during 6. The second nucleotide added to the newly synthesized strand is 


adenine, which is complementary to thymine on the template strand. 


DNA replication of the given sequence? 
TIP: DNA polymerase catalyzes the addition of a 
new nucleotide to the 3’ end of a growing strand. 


For more practice, see Problems 5, 8, 9, 16, and 17. 


Visit the Study Area to access study tools. 


Two of the three phosphates of a dNTP are removed (as 
a pyrophosphate group) during phosphodiester bond for- 
mation, leaving the nucleotides of a polynucleotide chain 
in their monophosphate form. Each polynucleotide chain 
has a sugar-phosphate backbone consisting of alternat- 
ing sugar and phosphate groups throughout its length. 


Complementary DNA Nucleotide Pairing 


DNA is most stable as a double helix, and the two poly- 
nucleotide strands that make up the duplex have a specific 
relationship that follows two rules: (1) the arrangement 
of the nucleotides is such that the nucleotide bases of one 
strand are complementary to the corresponding nucleo- 
tide bases on the second strand (A pairs with T and G pairs 
with C), and (2) the two strands are antiparallel in orien- 
tation (if one strand is, for example 5’- ATCG- 3’, then the 
complementary strand is 3’- TAGC- 5’). 

Complementary base pairing joins a purine nucleo- 
tide on one strand to a pyrimidine nucleotide on the 
other. The chemical basis of such pairing is the formation 
of a stable number of hydrogen (H) bonds between the 
bases of the different strands. Hydrogen bonds are non- 
covalent bonds that form between the partial charges that 
are associated with the hydrogen, oxygen, and nitrogen 
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atoms of nucleotide bases. As Figure 7.6 shows, two stable 
hydrogen bonds form for each A-T base pair, and three 
hydrogen bonds are formed by each G-C base pair (see 
also Figure 1.6, p. 8). 

Antiparallel strand orientation is essential to the 
formation of stable hydrogen bonds. In Figure 7.6, no- 
tice that the nucleotides in one strand are oriented with 
their 5’ carbon toward the top and their 3’ carbon to- 
ward the bottom. The complementary nucleotides in 
the other strand are antiparallel; that is, their 5’-to-3' 
orientations run in the opposite direction. Antiparallel 
orientation of complementary strands brings the partial 
charges of complementary nucleotides into alignment to 
form hydrogen bonds. If complementary strands were to 
align in parallel (i.e., with their 5’ and 3’ carbons facing 
in the same direction), the charges of complementary 
nucleotides would repel, and no hydrogen bonds would 
form. Genetic Analysis 7.1 explores relationships between 
complementary DNA strands. 


The Twisting Double Helix 


The DNA double helix has an axis of helical symmetry, 
an imaginary line that passes lengthwise through the core 
of the double helix and marks the center of the molecule. 


7.2 The DNA Double Helix Consists of Two Complementary and Antiparallel Strands 


The molecular dimensions of DNA are measured using 
the unit called an angstrom (A) or in nanometers (nm). 
One angstrom is equal to 10°'° meters, or 1 ten-billionth 
of a meter, and 1 nm equals one-billionth of a meter, or 
107° meters. In DNA, the distance from the axis of sym- 
metry to the outer edge of the sugar-phosphate backbone 
is 10 A (1 nm), and the molecular diameter is 20 A (2 nm) 
at any point along the length of the helix (Figure 7.7a). 
The 20-A molecular diameter results from complemen- 
tary pairing of each purine with the complementary py- 
rimidine (A with T, G with C) and gives each base pair the 
same dimension. 

Nucleotide base pairs are spaced at intervals of 3.4 A 
along DNA duplexes. This tight packing of DNA bases 
in the duplex leads to base stacking, the offsetting of 
adjacent base pairs so that their planes are parallel, and 
imparts a twist to the double helix. Figure 7.7a shows 
that one complete helical turn spans 34 A. This span is 
occupied by approximately 10.5 base pairs. Figure 7.7b 
is a space-filling model that illustrates base-pair stack- 
ing and the twisting of the sugar-phosphate backbones. 
Figure 7.7c is a ball-and-stick model illustrating how base 
pairs twist around the axis of symmetry to create the heli- 
cal spiral. 

Base-pair stacking creates two grooves in the dou- 
ble helix, gaps between the spiraling sugar-phosphate 


(a) Ribbon diagram 


(b) Space-filling diagram 


235 


backbones that partially expose the nucleotides. The 
alternating grooves, known as the major groove 
and minor groove, are highlighted in Figures 7.7b and 
7.7c. The major groove is approximately 12 A wide, and 
the minor groove is approximately 6 A wide. The major 
and minor grooves are regions where DNA-binding 
proteins can most easily make direct contact with nu- 
cleotides along one or both strands of the double helix. 
In this chapter and in later chapters, we discuss many 
of the important functions DNA-binding proteins per- 
form, such as regulating the initiation of transcription 
and controlling the onset and progression of DNA rep- 
lication. Most of these functions depend on the pres- 
ence of characteristic sequences of DNA nucleotides. 
DNA-binding proteins gain access to DNA nucleotides 
in major and minor grooves of the molecule. 

The models of the DNA double helix presented 
in Figure 7.7 illustrate the most common and most 
stable form of DNA, known as B-form DNA, which has a 
right-handed twisting of the sugar-phosphate backbone. 
B-form DNA is overwhelmingly the most common DNA 
structure in organisms. Two other rarer and less stable 
forms of the DNA double helix have also been identi- 
fied. A-form DNA is more compact than B-form DNA, 
with about 11 base pairs per complete helical twist and 
a higher degree of tilt of the base pairs relative to the 
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backbone. A-form DNA is occasionally detected in cells. 
The third form of DNA, Z-form DNA, is quite different 
from A-form and B-form DNA. Z-form DNA has a left- 
handed twist that gives the sugar-phosphate backbone 
a zigzag appearance—hence the name Z-form. Z-form 
DNA occurs in the presence of a high concentration of 
positively charged ions. Only a tiny portion of total cel- 
lular DNA is ever in the Z form, and its physiological 
significance in cells is not known. 


7.3 DNA Replication Is 
Semiconservative and Bidirectional 


Given the role of DNA as an information repository 
and an information transmitter, the integrity of the 
nucleotide sequence of DNA is of paramount impor- 
tance. Each time DNA is copied, the new version must 
be a precise duplicate of the original version. The high 
fidelity of DNA replication is essential to reproduction 
and to the normal development of biological structures 
and functions. Without faithful DNA replication, the 
information of life would become hopelessly garbled 
by rapidly accumulating mutations that would threaten 
survival. 

Considering the importance of DNA throughout the 
biological world, it was no surprise to discover that the 
general mechanism of DNA replication is the same in all 
organisms. This universal process evolved in the earliest 
life-forms and has been retained for billions of years. As 
organisms diverged and became more complex, how- 
ever, an array of differences did develop among DNA 
replication proteins and enzymes. Despite the diversifi- 
cation of these specific components of DNA replication, 
three attributes of DNA replication are shared by all 
organisms: 


1. Each strand of the parental DNA molecule remains 
intact during replication. 


2. Each parental strand serves as a template directing 
the synthesis of a complementary, antiparallel daugh- 
ter strand. 


3. Completion of DNA replication results in the for- 
mation of two identical daughter duplexes, each 
composed of one parental strand and one daughter 
strand. 


As we describe DNA replication in bacteria, archaea, 
and eukaryotes in following sections, we will point out 
similarities and differences among the domains. The 
shared features of DNA replication are present because 
all life evolved from a common origin. At the same time, 
the differences in DNA replication between the domains 
are also the result of evolution, which favored specific 
adaptations. 


Three Competing Models of Replication 


In their famous 1953 paper describing the structure of 
DNA, Watson and Crick concluded with the observation 


It has not escaped our notice that the specific 
base-pairing we have proposed immediately 
suggests a possible copying mechanism for the 
genetic material. 


Specifically, Watson and Crick recognized that a con- 
sequence of complementary base pairing was that nu- 
cleotides on one strand of the duplex could be used to 
identify the nucleotides of the other strand. Watson 
and Crick presumed that DNA replication used the 
nucleotide sequence of each strand to form a new 
pair of DNA duplexes, hypothesizing that each DNA 
strand of the original duplex would act as a template 
for the synthesis of a new daughter strand. Watson and 
Crick did not know the precise mechanism by which 
template-based replication took place, however, raising 
the crucial question of what the exact mechanism of 
replication might be. 

Almost immediately after the DNA structure was 
identified, three competing models of DNA replication 
emerged (Figure 7.8). The models shared the idea that 
the two original strands (the parental strands) of the 
duplex act as templates to direct the assembly of newly 
synthesized DNA by complementary base pairing. The 
models also predicted that the completion of DNA rep- 
lication produced two identical DNA duplexes (daughter 
duplexes). The models differed, however, in describing 
the makeup of the daughter duplexes. The @ semicon- 
servative DNA replication model—which proved to be 
correct—proposed that each daughter duplex contains 
one original parental strand of DNA and one complemen- 
tary, newly synthesized daughter strand. The @ conserva- 
tive DNA replication model predicts that one daughter 
duplex contains the two strands of the parental molecule 
and the other contains two newly synthesized daugh- 
ter strands. Lastly, the © dispersive DNA replication 
model predicts that each daughter duplex is a composite 
of interspersed parental duplex segments and daughter 
duplex segments. 


The Meselson-Stahl Experiment 


In 1958, Matthew Meselson and Franklin Stahl took advan- 
tage of the newly developed method of high-speed cesium 
chloride (CsCl) density gradient ultracentrifugation to de- 
cipher the mechanism of DNA replication in an experiment 
of beautiful simplicity. In this analytical method, a tube 
filled with a CsCl mixture is subjected to high ultracentri- 
fuge speeds that exert thousands of gravities of separating 
force, creating a graded variation in density—a density 
gradient—throughout the CsCl mixture. When substances 
are placed in the CsCl gradient and ultracentrifugation 
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takes place, the substances migrate until they reach the 
point in the density gradient where their molecular density 
is matched by that of the gradient. Migration stops at that 
point. This technique is capable of separating molecules 
that have only slightly different molecular weights. 

Meselson and Stahl began their experiment by grow- 
ing Escherichia coli in a growth medium containing the 
rare heavy isotope of nitrogen, I5N, for many generations. 
Under these growth conditions, parental DNA is fully 
saturated with heavy-isotope-containing nitrogen. All the 
DNA duplexes contain only the heavy nitrogen isotope, 
and they are designated 1°N/1°N to signify the incorpora- 
tion of N in both strands of the duplex. (By the same 
token, a DNA duplex composed of two strands containing 
only ‘4N, the normal isotope of nitrogen, is designated 
14N/4N, and a duplex with one strand containing each 
isotope is designated N/'N.) DNA collected for CsCl 
gradient analysis from this starting generation, designated 
generation 0, was exclusively 1°N/1°N. Next, some of 
these !°N-labeled E. coli were transferred to a new growth 
medium containing only the normal light isotope of ni- 
trogen, ‘4N. At the end of each successive DNA replica- 
tion cycle, DNA was collected from a few cells on the 14N 
medium for CsCl analysis. Growth in this medium leads 
to the incorporation of DNA nucleotides containing the 
light isotope into newly synthesized strands. 

Figure 7.9 shows the results of CsCl gradient analysis 
of DNA collected from three replication cycles, beginning 
with generation 0. The experimental results are consistent 
with the semiconservative model only. The conserva- 
tive model predicted DNA molecules with two distinct 
densities after generation 1 (5N/1°N and /4N/"4N). The 
results reject this model. Similarly, the dispersive model 
predicted a single DNA density in all generations. The 


generation 2 results reject this replication model. The 
data are consistent with the predictions of the semicon- 
servative model of DNA replication through generation 
3 shown and beyond. Within a few years of Meselson and 
Stahl’s identification of semiconservative replication in 
bacteria, the mechanism was identified experimentally in 
eukaryotes as well, solidifying the idea that all life shares 
the same general process of DNA replication, as a conse- 
quence of life’s single origin and the evolutionary connec- 
tions among living things. 


Origin and Directionality of Replication 
in Bacterial DNA 


Solving the riddle of the basic mechanism of DNA repli- 
cation introduced new questions about how replication 
is initiated and how it progresses. Does replication com- 
mence at specific points on each chromosome? If so, how 
many such points does a chromosome have? Does DNA 
replication progress in one direction or in both directions 
from a replication origin? Experimental evidence clearly 
demonstrates that DNA replication is most often bidirec- 
tional, progressing in both directions from a single origin 
of replication in bacterial chromosomes and from mul- 
tiple origins of replication in eukaryotic chromosomes. 

In 1963, John Cairns reported the first evidence of 
a single origin of DNA replication in E. coli. Based on 
Cairn’s evidence, it appeared that once replication gets 
underway in bacteria, there is expansion around the origin 
of replication, forming a replication bubble, as seen in 
Figure 7.10. The image shown in the figure is similar to the 
type of result Cairns obtained, but by itself, it did not al- 
low a determination as to whether replication takes place 
in one direction away from the origin (unidirectional) or 
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Figure 7.9 The Meselson-Stahl experimental results. Photographs of DNA bands in centrifuge 
tubes and densitometry scans (lower) identify the duplex DNA composition at each stage and are 
consistent only with semiconservative DNA replication. The semiconservative replication process is 


interpreted for each replication cycle. 


in both directions (bidirectional). The resolution of this 
uncertainty held important implications. If DNA repli- 
cated bidirectionally, the time required to replicate a bac- 
terial chromosome would be, give or take, about half that 
required if replication were unidirectional. 

The replication bubble is where active DNA replication 
takes place. If replication were unidirectional, the origin of 
replication would eventually also serve as the terminus of 
replication, once the process was completed around the cir- 
cumference of the circular bacterial chromosome. If, on the 
other hand, replication were bidirectional. Bidirectionality 
of replication would also mean that each end of the rep- 
lication bubble would contain a replication fork where 


new DNA nucleotides are added to elongating daughter 
strands. Furthermore, bidirectional replication would also 
mean that because the growth of the replication bubble 
progresses in both directions from the origin of replication, 
the terminus of replication would be halfway around the 
chromosome from the origin of replication. In contrast, 
unidirectional replication would mean that the origin and 
the terminus were at the same location. 

In 1968, Joel Huberman and Arthur Riggs used a 
technique called pulse-chase labeling to produce the first 
experimental evidence of bidirectional replication in mam- 
malian chromosomes (Figure 7.11). In pulse-chase labeling 
experiments, cells are exposed alternately to high levels of a 
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Figure 7.10 DNA replication bubble and replication forks. 
A replication bubble expands bidirectionally from an origin 

of replication and active DNA synthesis takes place at each 
replication fork. 


radioactive compound that they then incorporate into the 
DNA they are synthesizing. This is the “pulse.” Following 
each pulse, the radioactive compound is temporarily removed 
to allow replication to proceed without radioactive labeling of 
newly synthesized DNA. This is the “chase.” The result of the 
alternation between the presence and absence of the radioac- 
tive compound can be examined by autoradiography of newly 
replicated DNA. Autoradiography shows dark tracks where 
high levels of radioactive tracer are present and light tracks 
where levels are low. The bidirectional replication model pre- 
dicts alternating dark and light tracks in both directions from 
replication origins during a pulse-chase labeling experiment 
that will be symmetrical around an origin of replication. With 
bidirectional replication, the alternating pattern of bands oc- 
curs because the expanding replication fork incorporates ra- 
dioactivity in both directions away from the replication origin 
during the pulse in a symmetrical manner. The same concept 
applies to the absence of radioactivity in regions replicated 
during the chase. The pattern of symmetrical, alternating 
regions around each eukaryotic origin of replication obtained 
by Huberman and Riggs is consistent only with bidirectional 
replication. 

Additional support for the bidirectionality of DNA 
replication comes from biochemical studies of the DNA 
polymerase responsible for most E. coli DNA replication. 
This DNA polymerase is capable of incorporating about 
1000 nucleotides per second into a newly synthesized 
strand. At this rate of synthesis, the 4 X 10° nucleotides 
of the genome can be replicated in approximately 2000 
seconds (33 minutes). This is close to the minimum gen- 
eration time of E. coli. The enzymatic rate of the molecule 
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would have to be twice as fast if replication was unidirec- 
tional to complete replication within the generation time. 
In contrast to bacteria, the rate of catalytic activity of eu- 
karyotic DNA polymerase is approximately 2000 to 4000 
nucleotides per minute, less than a tenth the rate in E. coli. 
Eukaryotes have genomes many times larger than E. coli, 
and multiple chromosomes to replicate, so one can logi- 
cally conclude they replicate their genomes from multiple 
origins of replication on each chromosome. 

In bacteria, the matter of the directionality of rep- 
lication was at last conclusively resolved in 1973, when 
Raymond Rodriguez and his colleagues provided definitive 
evidence of bidirectional replication by showing that the 
origin of replication and the terminus of replication are 
on opposite sides of the chromosome and are separated by 
almost exactly 180 degrees of circumference around the 
circular chromosome. In the image shown in Figure 7.12a, 
the origin of replication is labeled by radioactivity on one 
side of the chromosome, while the replication terminus is 
labeled on the opposite side of the chromosome. The only 
possible interpretation is that DNA replication in bacteria 
is bidirectional. Figure 7.12b illustrates the progression of 
bidirectional replication from its origin to its completion. 


Multiple Replication Origins in Eukaryotes 


Autoradiograph analysis reveals multiple origins of 
replication on eukaryotic chromosomes, and direct observa- 
tion by electron microscopy confirms it (Figure 7.13a). Most 
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Figure 7.11 Pulse-chase labeling evidence of bidirectional 
DNA replication. (a) Huberman and Riggs results of pulse- 
chase labeling in mammalian chromosomes. (b) Interpretation 
of pulse-chase results according to the bidirectional model. 
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Figure 7.12 Bidirectional DNA replication. (a) Autoradiograph results from Rodriguez and coworkers 
in 1973, showing that the bacterial replication origin and replication terminus are on opposite sides of the 
chromosome. (b) The model of bidirectional replication of a circular bacterial chromosome. 


large eukaryotic genomes contain thousands of origins of 
replication, separated on average, on each chromosome, by 
40,000 to 50,000 base pairs (bp). Current estimates indicate 
that the human genome contains more than 10,000 origins 
of replication that are spaced 30 to 300 kilobases (kb) apart. 
Eukaryotic replication origins are not all initiated at the same 
moment. Notice, for example, that in Figure 7.13, the repli- 
cation bubbles shown are of different sizes, indicating that 
replication was initiated in them at different times. Among 
different types of cells, the length of S phase is variable, 
meaning that the rate of progression of DNA replication var- 
ies among cells of different types. Rapidly dividing cells repli- 
cate their DNA more quickly (i.e., have shorter S phase) than 
do slowly dividing cells. In addition, experimental evidence 
identifies “early-replicating” (i.e., early in S phase) and “late- 
replicating” (late in S phase) segments of large eukaryotic 


genomes. Early-replicating genome segments appear to con- 
tain many expressed genes, whereas late-replicating regions 
contain many fewer expressed genes. In Drosophila, for 
example, late-replicating regions include chromosome seg- 
ments immediately surrounding centromeres, where few 
expressed genes are located. 

Regardless of differences in the timing of initiation of 
the multiple origins of replication on a eukaryotic chromo- 
some, each of the replication bubbles emanating from an 
origin of replication expands toward the others to eventually 
merge, resulting in the replication of all of the DNA in each 
eukaryotic nucleus by the end of S phase (Figure 7.13b). The 
end products of replication of each eukaryotic chromosome 
are a pair of identical DNA duplexes that are sister chroma- 
tids. The sister chromatids will remain joined through G3 
and will be separated at anaphase of the upcoming M phase. 
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Figure 7.13 Multiple origins of replication on a single chromosome from Drosophila melanogaster. 
(a) The arrows point to replication bubbles, which are expanding bidirectionally. Different replication- 
bubble sizes indicate different start times. (b) Structures of multiple origins of replication in eukaryotic 


chromosomes. 


7.4 DNA Replication Precisely 
Duplicates the Genetic Material 


A great deal of what molecular biologists know about DNA 
replication comes from the study of bacteria, particularly 
E. coli. Chapter 1 presents a general overview of some of the 
basic steps of DNA replication. This section provides ad- 
ditional details of the process. Much remains to be learned 
about the mechanisms of DNA replication in the three 
domains of life; however, the information available to date 


and the availability of genome sequences have revealed 
that eukaryotes and archaea possess strikingly similar DNA 
replication machinery that is evolutionarily distinct from 
the replication machinery in bacteria. The archeael process 
is, in many ways, a simpler version, and likely an ancestral 
version, of eukaryotic DNA replication. The evolutionary 
history of the development of DNA replication is the subject 
of active investigation, but what is clear is that during the 
evolution of life, two distinctly different sets of DNA repli- 
cation machinery developed, one in bacteria and the other 
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in archaea and eukaryotes. We will highlight similarities and 
differences between these processes as we move through 
this chapter section, comparing and contrasting the events 
and molecular activities that accompany DNA replication in 
species of the three domains. 

To begin, we offer a cautionary note about discussions 
of DNA replication. Although parts of our replication dis- 
cussion identify individual enzymes and proteins, do not 
be misled into thinking of these proteins as solo actors that 
enter and leave the replication fork at will. Instead, they 
are part of large, complex aggregations of proteins and en- 
zymes called replisomes that assemble at each replication 
fork. In E. coli, for example, the replisomes active in DNA 
replication contain more than 30 distinct proteins and en- 
zymes. Later in the section, we describe how one replisome 
at each replication fork carries out the nearly simultaneous 
replication of both template strands. 

We begin this section with Foundation Figure 7.14, which 
provides a step-by-step overview of bacterial DNA replica- 
tion. At each step, the activities of the principal molecular 
players are identified. You can refer back to this Foundation 
Figure as you make your way through the following pages. 


DNA Sequences at Replication Origins 


Origins of DNA replication contain sequences that attract 
replication enzymes. The best-characterized origin-of- 
replication sequence is from £. coli and is designated oriC. 
This sequence, which contains approximately 245 bp of 
DNA, is AT-rich (i.e., has a preponderance of adenine and 
thymine base pairs). DNA regions containing A-T richness 
require less energy for their denaturation, a process we will 
see happening at oviC early in the initiation of replication. 
OriC is subdivided by three 13-bp sequences, so-called 
13-mers, followed by four 9-bp sequences, called 9-mers 
(Figure 7.15a). Other bacterial species have origin-of-repli- 
cation sequences that are similar to oriC. This similarity is 
a product of evolutionary conservation of DNA sequences 
and the functionality of those sequences. Natural selec- 
tion has acted to maintain sequence similarity because the 
function of the conserved sequence region is essential to 
the survival of the organism. In other words, natural selec- 
tion maintains sequences of DNA within a region that per- 
forms an essential function. Comparisons of evolutionarily 
conserved sequences within and among related species 
often leads to the identification of consensus sequences. 
These sequences have a generally similar pattern of base 
pairs, although they are not identical. Rather, consensus 
sequences are described by the nucleotides found most 
often at each position of DNA in the conserved region. In 
this context a consensus sequence is a conserved nucleo- 
tide sequence that acts as the binding site for proteins that 
initiate replication. Consensus sequences are plentiful in 
nucleic acids and generally function as conserved recogni- 
tion sequences for protein binding in regulatory processes. 
The 13-mer and 9-mer consensus sequences 
that are part of oriC have been maintained by natural 


selection because they have essential functional roles in 
replication initiation. Beyond the presence of the consen- 
sus sequences themselves, natural selection may also act 
to maintain specific spacing between different segments 
of a consensus sequence region. Spacing can be important 
to the function of the sequence because DNA-binding 
proteins must assemble at consensus sequence sites. 
Different proteins may be attracted to different regions of 
consensus sequences, and each protein requires physical 
space to bind to DNA and to interact with the other pro- 
teins bound to the consensus sequence region. 

Among eukaryotic organisms, the yeast Saccharomyces 
cerevisiae has the most fully characterized origin-of- 
replication sequences. In yeast, the multiple origins of rep- 
lication are known as autonomously replicating sequence 
(ARS). There is overall conservation of DNA sequence 
in ARSs, and their organization is similar throughout the 
yeast genome. ARS1 in yeast has been fully sequenced 
(Figure 7.15b). Within the 95 bp of ARS1 is an 11-bp con- 
sensus sequence and three other regions (B4, B2, and B3) of 
conserved DNA sequences that differ somewhat from one 
another and from the 11-bp consensus sequence region. 

Much less is known about the DNA sequences at 
replication origins in other eukaryotic species, particu- 
larly in multicellular species. What is known is that there 
are thousands of origins of replication distributed among 
the multiple chromosomes of eukaryotes. These origins 
initiate replication at various times during S phase of the 
cell cycle, leading to the identification of early- and late- 
replicating segments of chromosomes. Genome sequence 
data do not identify any sequence consistent with a repli- 
cation of origin sequence in multicellular eukaryotes; thus 
it seems likely that DNA is selected for replication in mul- 
ticellular eukaryotes based on chromatin modification 
rather than by the presence of specific DNA sequence. 

Archaeal species fall somewhere in between the alter- 
natives represented by the single, sequence-specific origin 
of replication in bacteria, the multiple and sequence-specific 
origins in yeast, and the numerous, non-sequence-specific 
origins in multicellular eukaryotes. Since the archaea pos- 
sess homologs of the eukaryotic replication proteins, but 
also have small, circular chromosomes like bacteria, it was 
initially unclear whether archaeal cells would utilize single 
or multiple origins of replication. The first archaeal species 
to have its origin of replication mapped was Pyrococcus 
abyssi. It has a single origin of replication. Subsequently it 
was found that the archaeal species Sulfolobus solfataricus 
uses three origins of replication. Multiple origins of replica- 
tion have been found in a variety of other archaeal species, 
although others with a single replication origin have also 
been identified. 

In addition, many archaeal species possess an ORB 
(origin recognition box) sequence at the sites of replication 
origin. These sequences bind replication-initiating proteins 
that are homologous to those in eukaryotes, indicating that 
the molecular processes that initiate replication in archaea 
are more similar to those of eukaryotes than those of bacteria. 
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Figure 7.15 Origin of replication sequences in E. coli and yeast. (a) OriC in E. coli contains three 
13-mer and four 9-mer consensus sequences in a region of 245 base pairs of conserved sequence. 

(b) The yeast ARS1 origin of replication contains a consensus 11-bp segment and regions By, Bo, 

and B3, spanning 95 base pairs of conserved sequence. A solidus (/) between nucleotides of consensus 
sequences (e.g., A/T) indicates that the two nucleotides are equally common at this position. 


Replication Initiation 


DNA replication in E. coli requires that replication-initiating 
enzymes locate and bind to the consensus sequences in 
oriC. In E. coli, three enzymes, DnaA, DnaB, and DnaC, 
bind at oriC and initiate DNA replication (Figure 7.16 and 
Table 7.1). The first to bind is DnaA, attaching to the 9-mer 
components of oriC. The DnaA bends DNA and breaks 
(hydrolyzes) hydrogen bonds in the A- T-rich 13-mer region 
of oriC, creating an open complex, a short region where 
strands of the duplex are separated. Then DnaB, carried 
to oriC by DnaC, attaches to both strands in the open 
complex. The DnaB is a helicase protein that uses ATP 
energy to hydrolyze hydrogen bonds joining complementary 
nucleotides. This hydrolysis separates the DNA strands and 
unwinds the double helix. The unwound strands of DNA 
would seek maximum stability by reannealing, re-forming 
complementary double-stranded DNA, except for the 
presence of single-stranded binding protein (SSB). Single- 
stranded binding protein prevents reannealing of the sepa- 
rated strands, keeping them available to serve as templates 
for new DNA synthesis (see Figure 7.14, step Q). 


In eukaryotes, helicase recruitment and activity is best 
understood in yeast, where four protein subcomplexes are 
involved. At eukaryotic replication origins, a prereplication 
complex (preRC) of 14 proteins assembles. Six proteins of 
the preRC, Orcl through Orc6 (Orc1-6), form a subunit 
identified as the origin replication complex (ORC) that acts 
as the initiator of eukaryotic DNA replication by identify- 
ing the origin site. ORC is then bound by the proteins Cdc6 
and Cdt1 and by a double hexamer of the replicative heli- 
case MCM. Each hexamer is made up of six subunit of the 
protein Mcm2-7. The paired Mcm2-7 hexameric rings 
encircle both strands of the DNA duplex. As S phase com- 
mences, two additional proteins Cdc45 and a multisubunit 
GINS protein, join with Mcm2-7. Collectively, they form 
the CMG complex (Cdc45-Mcm2-7-GINS). The CMG 
complex is the fully actives DNA unwinding, leading to 
breakage of hydrogen bonds between the DNA strands 
ahead of DNA polymerase activity. 

In archaea, it is thought that helicase recruitment is 
similar to events in yeast. An initiator protein complex iden- 
tified as Orcl/Cdc6 binds to ORB sequences at the origin 
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Figure 7.16 Replication initiation at oriC, requiring DnaA, 
DnaB, and DnaC proteins. 


of replication. This complex contains at least one protein, 
and possibly as many as three proteins, that are homologous 
to the eukaryotic ORC1 and CDC6 proteins. These events 
initiate replication, although several of the details of the 
complete mechanism are not yet known (see Table 7.1). 

In all organisms, the DNA polymerase enzymes that 
are responsible for synthesizing new DNA strands use 
the template strand to direct the addition of nucleotides 
to daughter strands in a complementary and antiparallel 
manner. These new nucleotides are added to the 3’ end 
of the growing daughter strand, and the overall direction 
of daughter strand elongation is 5’ to 3’. Curiously, how- 
ever, DNA polymerases are unable to initiate DNA strand 
synthesis on their own. To perform its catalytic activity, 
a DNA polymerase requires the presence of a primer 
sequence, a short single-stranded segment that begins 
a daughter strand and provides a 3’-OH end to which a 


progression 


new DNA nucleotide can be added by DNA polymerase. 
To satisfy the requirement for a primer, DNA replication 
is initiated by a specialized RNA polymerase, DnaG, also, 
called primase, that synthesizes a short RNA primer. 

In E. coli DNA replication, the DnaG complex joins 
DnaA, DnaB, and DnaC at oriC, where DnaG synthesizes the 
RNA primer. Measuring just one dozen to two dozen nucle- 
otides in length, RNA primers provide the 3’ OH needed for 
DNA polymerase activity. RNA primers contain the nucleo- 
tide base uracil (U), in place of thymine. Consequently, RNA 
primers cannot remain as part of fully replicated DNA. Thus, 
while they are essential for allowing DNA polymerase to be- 
gin its DNA synthesis, RNA primers are temporary and are 
removed from newly synthesized DNA strands by a process 
we describe in the following section. 

In eukaryotic DNA replication, the RNA-synthesizing 
enzyme primase synthesizes the RNA primer at replica- 
tion origins. Eukaryotic primase activity is delivered by 
a four protein complex known as the polymerase a com- 
plex. Two of these subunits are the catalytic and regula- 
tory subunits of primase, and the other two are catalytic 
subunits of a DNA polymerase a. After the RNA primer 
has been synthesized, polymerase a synthesizes DNA for 
a short distance. It is soon replaced by the main DNA 
polymerase, polymerase 6 or e. 

The archaeal equivalent, also called primase, consists 
of two protein subunits. These subunits are homologs 
of the eukaryotic primase subunits. There are no ar- 
chaeal homologs of DNA polymerase a, which appearntly 
evolved in eukaryotes. Although the archaeal primase 
is distinct from bacterial DnaG it should be noted that 
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archaea possess homologs of DnaG. The archaeal DnaG 
homologs are, however, involved in RNA processing 
events rather than functioning in DNA replication. 

During DNA replication, all DNA molecules undergo 
some level of superhelical twisting that imparts torsional 
twisting to the molecule beyond that of the spiraling dou- 
ble helix. Linear DNA found in eukaryotes manages this 
extra twisting relatively easily, since the ends of chromo- 
somes are free to twist to uncoil. Circular chromosomes 
are a different matter. Since they are closed by covalent 
bonds (phosphodiester bonds), superhelical twisting that 
accompanies DNA replication creates torsional stress that 
would shear the molecule if it were left uncontrolled. 
As replication progresses, unwinding of the double he- 
lix causes superhelical twisting to accumulate, producing 
supercoiled DNA that resembles an over-twisted rubber 
band (Figure 7.17a). To avoid random breakage in the 
molecule that could lead to a breakdown of DNA repli- 
cation, enzymes known as topoisomerases, also called 
DNA gyrases, catalyze a controlled cleavage and rejoin- 
ing of DNA to allow over-wound DNA strands to unwind 
(Figure 7.17b). Relief of supercoiling is accomplished by 
cutting either one or both strands of DNA (various topoi- 
somerases operate differently), allowing DNA to unwind 
and then resealing the strands. 


Continuous and Discontinuous 
Strand Replication 


Each strand of parental DNA acts as a template for the 
synthesis of a new daughter strand of DNA. In E. coli, 
daughter DNA strands are synthesized at the replication 
fork by the DNA polymerase III (pol III) holoenzyme, 
the principal DNA-synthesizing enzyme (see Figure 7.14, 
step 4). Holoenzyme is the general term used for multiprotein 
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complexes in which a core enzyme is associated with addi- 
tional protein components that complete its structure and 
lead to its function. The pol III holoenzyme begins its work 
at the 3’-OH end of an RNA primer and rapidly synthesizes 
new DNA with a sequence complementary to the template- 
strand nucleotides. Pol III adds new nucleotides to a daugh- 
ter strand as long as there are complementary nucleotides 
on the template strand to direct nucleotide addition to the 
daughter strand. 

Experimental evidence indicates that most of the 
enzymes we are describing as participating in DNA rep- 
lication are part of a single large protein complex at each 
replication fork called the replisome. There is one repli- 
some at each replication fork, and each contains, among 
other components, two complete pol III holoenzymes. In 
each replisome, one pol III holoenzyme carries out the 
5'-to-3' synthesis of one daughter strand continuously, 
in the same direction in which the replication fork pro- 
gresses. The second pol III enzyme in a replisome carries 
out synthesis of the other daughter strand. The con- 
tinuously elongated daughter strand is called the leading 
strand (Figure 7.18). Notice that Figure 7.18 divides the 
replication bubble into four quadrants. The upper right 
and lower left quadrants contain leading strands. 

The daughter strands in the upper left and lower 
right quadrants shown in Figure 7.18 have a 5'-to-3' di- 
rection of elongation that runs opposite to the direction of 
movement of the replication fork. These daughter strands 
are elongated discontinuously, in short segments, each of 
which is initiated by an RNA primer. The discontinuously 
synthesized daughter strand is called the lagging strand. 
Thus in Figure 7.18, the lower right and upper left quad- 
rants of the replication bubble contain lagging strands. 

Reiji Okazaki detected the synthesis of short frag- 
ments of DNA in the replication of the lagging strand. 
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Figure 7.17 DNA supercoiling in bacteria (a) and its cutting and release by topoisomerase (b). 
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He observed that early in bacterial replication, newly 
synthesized DNA segments on one strand are 1000 to 
2000 nucleotides long, while later in replication the newly 
synthesized segments are much longer. Okazaki’s discov- 
ery suggested that short segments of DNA are synthesized 
and that these short segments are joined together as repli- 
cation progresses. The short segments of newly replicated 
DNA are called Okazaki fragments, and they are the 
result of discontinuous synthesis of DNA on the lagging 
strand. Okazaki fragments in eukaryotes are much shorter 
than those in bacteria, 100 to 200 nucleotides in length. 
Similarly, archaeal Okazaki fragments are short. 

In Figure 7.18, notice that each daughter strand con- 
tains a segment characterized as leading strand that adjoins 
a segment characterized as lagging strand. All daughter 
strands are composed of adjoining leading and lagging seg- 
ments, and they will ultimately be structurally identical. 

Overall, the pattern of DNA replication involving a 
leading strand and a lagging strand is similar in bacteria, 
eukaryotes, and archaea. Three DNA polymerases are re- 
cruited to eukaryotic origins of replication sites. All three 
are part of the large replisome complex that assembles 
at each replication fork to carry out leading and lagging 
strand synthesis. DNA polymerase € is responsible for 
leading strand synthesis, while DNA polymerase 6 is re- 
sponsible for lagging strand synthesis. DNA polymerase 
a, which begins the DNA synthesis following RNA primer 
synthesis and extends a few nucleotides before being 
replaced by the main DNA replication enzyme, is more ac- 
tive on the lagging strand due to multiple priming events. 

It is less clear how archaeal leading and lagging strand 
repliction is accomplished and regulated. Archaea gener- 
ally possess at least one, and often multiple, homologs of 
eukaryotic replication polymerases. This has led to specu- 
lation that the polymerases in archaea function in about 


the same way as do those in eukaryotes. See Table 7.2 for 
a comparison of selected DNA polymerases in the three 
domains of life. 


RNA Primer Removal and 
Okazaki Fragment Ligation 
To complete DNA replication, RNA primers must be 


removed and replaced with DNA, and Okazaki fragments 
must be joined together to form complete DNA strands. 


Table 7.2 Properties of Selected Bacterial, Eukaryotic, 


and Archaeal DNA Polymerases 


Polymerase Functions 
Bacterial polymerases 
DnaG RNA primer synthesis 
| RNA primer removal, proofreading, 
mutation repair 
IIl DNA replication, proofreading 
Eukaryotic polymerases Ta -. * 
Primase/a Primer synthesis and lagging strand 
synthesis 
ô Lagging strand synthesis, { 
proofreading, DNA mutation repair 
E€ Leading strand synthesis, 
proofreading, DNA mutation repair 
| Archaea polymerases 
Primase Primer synthesis 
F PolB z DNA synthesis 
PolD DNA synthesis 
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In E. coli these tasks are accomplished by the enzymes 
DNA polymerase I and DNA ligase that are each part of 
the replisome complex at each replication fork. 

When DNA pol III on the lagging strand reaches 
an RNA primer, thus running out of template, it leaves a 
single-stranded gap between the last DNA nucleotide of the 
newly synthesized daughter strand and the first nucleotide 
of the RNA primer (Figure 7.19). The pol III, having very low 
affinity for these DNA-RNA single-stranded gaps, is then re- 
placed by DNA polymerase I (pol I), which has high affinity 
for such gaps (Figure 7.19, @). The DNA pol I removes nu- 
cleotides of the RNA primer one by one and replaces them 
with DNA nucleotides, beginning with the 5’ nucleotide of 
the RNA primer and progressing in the 3’ direction until all 
the RNA nucleotides in the primer have been replaced by 
DNA nucleotides complementary to the template strand. 

The pol I enzyme possesses two activities that ac- 
complish the removal of RNA nucleotides and their 
replacement by DNA nucleotides. DNA pol I first uses 
its 5’-to-3’ exonuclease activity to remove the 5’-most 
nucleotide from the RNA primer. This creates one open 
space opposite the template, which is then filled with 
the correct DNA nucleotide by the 5’-to-3’ polymerase 
activity of DNA pol I. The pol I removes each RNA 
primer nucleotide and replaces each with a DNA nucleo- 
tide. In so doing, pol I continually pushes the single- 
stranded gap in the 3’ direction, eventually replacing all 
of the RNA primer nucleotides with DNA nucleotides. 

Once the entire RNA primer is replaced, a remaining 
single-stranded gap sits between two DNA nucleotides. 
At this point, DNA ligase, having exclusive and very high 
affinity for DNA-DNA single-stranded gaps, is attracted 
to the gap and there performs its single task of forming a 
phosphodiester bond between the two DNA nucleotides 
that joins two Okazaki fragments. Both pol I and DNA 
ligase are active on leading and lagging strands. The level 
of activity is greater on lagging strands, however, where 
every 1000 to 2000 nucleotides, they are needed to join 
Okazaki fragments during replication of E. coli DNA. 

In eukarya and archaea, RNA primers are removed 
and DNA segments are ligated together to finish replica- 
tion. The principal enzymes that accomplish these tasks 
are very similar. Replication protein A (RPA) and two 
nuclease enzymes, Fen1 and Dna2, accomplish primer re- 
moval and replacement in eukaryotes and archaea. DNA 
ligase operates to seal single-stranded nicks to complete 
the assembly of new DNA strands. 


Simultaneous Synthesis of Leading 
and Lagging Strands 


As we have seen, the replisome components in E. coli in- 
clude two DNA pol III holoenzymes, one of which synthe- 
sizes the leading strand and the other the lagging strand. As 
we describe momentarily, a similar organization exists dur- 
ing eukaryotic and archaeal DNA replication as well. Each 


replisome complex carries out replication of the leading 
strand and the lagging strand simultaneously. The replisome 
also includes pol I and ligase, as well as numerous other 
components that collectively carry out DNA replication. 
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Figure 7.19 Removal and replacement of RNA primer 
nucleotides and ligation of Okazaki fragments in E. coli. 
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Figure 7.20 DNA polymerase Ill holoenzyme. The complex 
contains two DNA polymerase core enzymes attached to 7 (tau) 
arms, and the clamp loader, shown holding a sliding clamp. 


The “processivity” of DNA polymerases alone—that is, 
the ability of DNA polymerases to drive their own move- 
ment along template strands during replication—is compar- 
atively low. This means that, by themselves, they are unable 
to provide the momentum required to both synthesize new 
DNA and progress along the template strand. To enhance 
the processivity of these polymerases, they associate with an 
auxiliary protein complex known as a sliding clamp. 

The two E. coli DNA pol II holoenzymes each con- 
tains 11 protein subunits. The two pol III core poly- 
merases are each tethered to a different copy of the r 
(tau) protein (Figure 7.20). The 7 proteins are joined to a 
five-protein complex known as the clamp loader. Two 
additional proteins form the sliding clamp, a protein 
structure that can close around double-stranded DNA 
during replication. The sliding clamp, with its diameter of 
approximately 50 A, has a “doughnut hole” of about 35 A 
that encircles the DNA (Figure 7.21a). 

Each sliding clamp locks onto a DNA template strand 
and there affiliates with DNA pol III core enzyme, firmly 
anchoring the enzyme to the template to carry out the 
bulk of replication (Figure 7.21b). The clamp is the key to 
the enzyme’s high level of activity. Pol II on DNA without 
a sliding clamp has very low processivity. When no more 
template is available, the DNA pol III is dropped by the 
sliding clamp and replaced by DNA pol I, which as we have 
seen removes RNA primers and replaces them with DNA. 

Foundation Figure 7.22 presents a model of how the 
DNA pol III holoenzyme coordinates the simultaneous 
synthesis of leading and lagging strands at a replication 
fork. The outline of this model was proposed in the early 
1960s by Arthur Kornberg to explain the experimental 
observation that a single large protein complex at each 
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Figure 7.21 The DNA sliding clamp. (a) Two views of the 
sliding clamp, one showing the clamp and DNA polymerase on 
DNA in profile (left) and the other showing DNA through the 
“doughnut hole” of the sliding clamp (right). (b) The sliding clamp- 
DNA polymerase complex has high processivity during replication. 


replication fork carries out replication of both strands 
of DNA. Known both as the Kornberg model and as the 
“trombone” model, it has been revised and updated in the 
decades since it was first proposed. The trombone model 
depicts the activity of the clamp loader in providing a 
mechanism for the continuous synthesis of leading strand 
regions and for the grasping, synthesis, and release of lag- 
ging strand regions by DNA pol IlI-sliding clamp com- 
plexes affiliated with each arm of the clamp loader. This 
model provides a mechanism by which a single replisome 
can advance with the replication fork and synthesize both 
daughter strands as it proceeds. In summary, replisomes 
contain multiple DNA polymerase enzymes and a large 
number of accessory proteins that operate in a rapid and 
highly coordinated manner to carry out DNA synthesis. 

In archaea and eukaryotes, homologous proteins pro- 
vide processivity to DNA polymerases. The proliferating 
cell nuclear antigen (PCNA) protein functions as the 
sliding clamp in archaeal and eukaryotic replication, en- 
circling the DNA template strand. In these domains, the 
replication factor C (RFC) complex fills the role of the 
bacterial 7 protein by connecting the DNA polymerases to 
the clamp loader and sliding clamp. 


DNA Proofreading 


Accurate replication of DNA is essential for the survival 
of organisms. The introduction of errors into a DNA 
sequence during replication could create potentially le- 
thal mutations. While this occasionally happens, DNA 
replication is remarkably accurate and is not a major 
source of mutation, largely because DNA polymerases are 
generally able to undertake DNA proofreading to be sure 
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replication is accurate. As a result of DNA proofreading, 
mutations due to DNA replication errors occur about 
once every billion (10°) nucleotides in wild-type E. coli. To 
put this number into perspective, consider this textbook 
as an analogy. It contains about 800 pages, each holding 
about 5000 “bits” of information (letters, punctuation 
marks, spaces, etc.) for a total of 4 X 10° bits per book. 
It would take 250 books, each the size of this one, to equal 
10° bits of information. If each bit were equal to a DNA 
nucleotide, the error rate for DNA replication would be 
like having one typographical error in all 250 books! 

This extraordinary accuracy is the work of the mul- 
tifunctional DNA polymerases that have the ability not 
only to synthesize DNA (5'-to-3’ polymerase activity) but 
also to “proofread” newly synthesized DNA for accuracy 
and remove erroneous nucleotides (see Table 7.2). This 
proofreading ability resides in the 3’-to-5’ exonuclease 
activity of DNA polymerases capable of removing some 
of the newly laid daughter strand sequence. 

Polymerases like pol III and pol I have a structure 
somewhat like an open hand: A “thumb” and “fingers” hold 
the template and daughter strands in the “palm,” where 
5'-to-3’ polymerase activity is centered (Figure 7.23). When 
a replication error occurs, the mismatched DNA bases of 
the template and daughter strands are unable to hydrogen 
bond properly. As a result, the 3'-OH end of the daughter 
strand becomes displaced, blocking the further addition of 
nucleotides and inducing rotation of the daughter strand 
into the 3’-to-5’ exonuclease site at the “heel” of the hand. 
Several nucleotides, including the mismatched one, are 
then removed from the 3’ end of the daughter strand, after 
which the daughter strand rotates back to the polymerase 
site in the palm and replication resumes. Like their coun- 
terparts in bacteria, the principal DNA replication poly- 
merases in eukaryotes and archaea also have proofreading 
ability to help ensure the accuracy of DNA replication. 

Genetic Analysis 7.2 checks your understanding and 
analysis of molecular events at the replication fork. 


Finishing Replication 


Once bacterial DNA replication has completed the 
synthesis of new DNA and the replacement of RNA 
primer nucleotides with DNA nucleotides, separation 
of the daughter chromosomes must occur. This is ac- 
complished by topoisomerase enzymes that break one 
of the double-stranded chromosomes, pass the other 
chromosome through the gap, and then reseal the dou- 
ble-stranded break. A similar event may occur at the 
end of archaeal replication to separate the daughter 
chromosomes. Linear chromosomes, such as those in 
the nuclei of your cells, present a unique and different 
problem with regard to DNA replication—they cannot 
be replicated all the way to their ends! Instead, eukary- 
otic chromosomes get progressively shorter with each 
replication cycle. 
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Figure 7.23 DNA polymerase proofreading activity. 

(a) A replication error by polymerase. (b) Polymerase shifts 
on newly synthesized DNA to utilize its 3’-to-5’ exonuclease 
activity. (c) The polymerase resumes 5’-to-3’ synthesis. 


This apparent deficiency in the replication process is a 
consequence of an RNA primer being located at one end of 
the lagging strand and thus not able to be replaced by DNA. 
In consequence, the resulting lagging strand is shorter than 
its template strand, causing the chromosome to become 
shorter with each replication cycle (Figure 7.24). 

The loss of DNA with each replication cycle sounds 
ominous, but the problem is solved by the presence at 
chromosome ends of repetitive DNA sequences called 
telomeres. Telomeres do not contain protein-coding 
genes, but instead are made up of repeats that are most 
often 6-bp sequences repeated hundreds or thousands of 
times to give the telomere a length of 2 to 20 kb, depending 
on the species. Since its sequences are repetitive and con- 
tain no genetic information, portions of the telomere can 
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Figure 7.24 Loss of DNA at telomeres. Leading strands 
are synthesized to the ends of linear chromosomes, but lag- 
ging strands are shortened each replication cycle, when RNA 
primer sequence at the telomere end of the template strand is 
removed but not replaced with DNA nucleotides. 


safely be lost in each replication cycle, without consequence 
to the organism. Gel electrophoresis of telomeric DNA has 
documented the progressive shortening of telomere length 
during cell culture. 

Telomeres are synthesized by the ribonucleoprotein 
telomerase, consisting of several proteins and a molecule 
of RNA. The telomerase RNA molecule is encoded by a 
distinct gene and acts as the template for the telomeric 
DNA repeat sequence. Elizabeth Blackburn and Carol 
Greider discovered both telomeres and telomerase in 
1987 and along with Jack Szostak were awarded the 2009 
Nobel Prize in Physiology or Medicine for their work. 

Figure 7.25 depicts the mechanism of telomerase 
action deduced from the study of the ciliated protozoan 
Tetrahymena. The repetitive sequence 5'-TTGGGG- 3’ 
is the characteristic telomeric repeat sequence of 
Tetrahymena. The template RNA in the Tetrahymena 
telomerase contains the repeat AACCCC that is used to 
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Figure 7.25 Telomerase synthesis of repeating telomeric 
sequence. 


GENETIC ANALYSIS 


PROBLEM Two strains of E. coli have temperature-sensitive mutations that hamper their ability to 
complete DNA replication. At 25°C, both strains are able to complete replication, but neither is able 
to complete replication at 40°C. At 40°C, temperature-sensitive mutant 1 is able to synthesize DNA 
by DNA polymerase III activity, and it is able to remove RNA primers and replace them with DNA, but 
it accumulates many short segments of DNA (Okazaki fragments) that are not joined together. At 
40°C, temperature-sensitive mutant 2 also synthesizes DNA by polymerase III activity, but it is unable 
to remove RNA primers and replace them with DNA. For each of these mutants, use the information 
provided here to identify the molecule that is most likely carrying the temperature-sensitive mutation. 
Identify which normal major events of DNA replication each mutant can complete at 40°C and which 
normal events are altered in each mutant. 


Solution Strategies Solution Steps 


Evaluate 


BREAK IT DOWN: Temperature- 
sensitive mutations are the result of proteins 
that have full function at a lower temperature 
but denature and lose function at higher 
temperatures (see Section 4.1) 


1. Identify the topic area ad- 1. This problem addresses DNA replication and asks you to identify the function of par- 
dressed by this problem. ticular proteins and enzymes that are active at different stages of replication. 

2. Identify the critical information 2. Two E. coli strains with different temperature-sensitive mutations of DNA replication 
given in the problem and the are described. Mutant strain 1 accumulates Okazaki fragments that cannot be joined 
nature of the required answer. together, and mutant strain 2 is unable to remove RNA primers. 


Deduce 


3. Review the molecular events 3. A review of Foundation Figure 7.14 (p. 243) and of Section 7.4 shows that in E. coli, 
and principal molecules DNA polymerase | is responsible for the removal of RNA primer nucleotides and their 
that are involved replacement with DNA nucleotides, and that DNA ligase joins Okazaki fragments 
in RNA primer together. 
removal and RNA 
primer replacement. 


TIP: The function of 
principal proteins and 
enzymes in £. coli DNA 


replication is discussed 


in Section 7.4 
Solve 
4. Identify the molecule affected 4. Mutant 1 is most likely to have a defect in DNA ligase. 
by mutation in mutant 1. 
5. Identify the molecule affected 5. Mutant 2 is most likely to have a defect in DNA polymerase I. 


by mutation in mutant 2. 

6. Identify which parts of DNA rep- 6. Mutant 1 is able to synthesize RNA primers by DnaG activity and is able to synthesize 
lication are completed at 40°C DNA with polymerase III activity. It is also able to remove RNA primers and replace the 
and which are affected by each RNA nucleotides with DNA through polymerase | activity. However, mutant 1 is defec- 
mutation. tive in its ability to ligate Okazaki fragments together by DNA ligase activity, and these 

fragments remain unconnected. 

Mutant 2 has fully functional DnaG and polymerase Ill to synthesize RNA primers 
and most DNA. It lacks active DNA pol I, however, and is therefore unable to remove 
RNA primers and replace them with DNA. 


For more practice, see Problems 14, 15, and 18. Visit the Study Area to access study tools. 


MasteringGenetics”™ 


elongate the telomere of one strand enough to allow new 
DNA replication to fill out the chromosome ends. 

In the decades since Blackburn and Greider identified 
telomere structure and this mechanism for their mainte- 
nance, similar repeating telomeric sequences have been 
detected in all eukaryotes. For example, the human telo- 
meric repeat sequence is 5'-TTAGGG- 3’, and it is encoded 
by a telomeric RNA molecule with the complementary 
repetitive sequence 3'-AAUCCC-5’. In humans, telomeric 
sequence is repeated 250 to 1500 times at chromosome 
ends. The same telomeric sequence and template DNA se- 
quence are found in vertebrates, protozoans (Trypanosoma), 


yeast (Saccharomyces), fungus (Neurospora), and plants 
(Arabidopsis). This represents an example of convergent 
evolution of DNA sequences. Convergent evolution is a 
mechanism producing similar traits or, in this case, DNA 
sequences among distantly related organisms due to allow 
similar adaptation or natural selection pressure. 

The importance of telomerase activity in germ-line 
cells has been demonstrated in experimental mouse lines 
that are mutated to be homozygous for loss-of-function 
mutations of the TERT (telomerase reverse transcriptase) 
gene, the gene that encodes telomerase. These homozygous 
mutant mice are relatively normal when interbred for up 
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to three generations, but severe developmental and fertility 
defects are detected in the fourth and fifth inbred genera- 
tions. TERT loss-of-function homozygosity is lethal by the 
seventh generation, meaning that no inbred TERT-deficient 
mice can be maintained by inbreeding for more than six 
generations. 

The molecular explanation for the delayed phenotypic 
effect of TERT inactivation is that each successive genera- 
tion of inbreeding in the homozygous mutant line leads to 
the loss of telomeric DNA. It is now evident that genetic 
mechanisms monitor telomere length, and that telomere 
length is a kind of chronometer that keeps track of the age 
of a cell. Once the shortening reaches a critical point, the 
cell is directed into the apoptotic pathway, the mechanism 
of programmed cell death that removes old or damaged cells 
from an organism. This phenomenon is thought to be the 
explanation for a long-standing observation in cell biology 
that most normal cells survive in culture for between about 
30 to 50 cell divisions before entering a crisis phase, where 
their division first slows and then stops altogether, and the 
cells die. 


Telomeres, Aging, and Cancer 


Considering the importance of telomere length to chromo- 
some stability, cell longevity, and reproductive success, it 
may surprise you to learn that telomerase activity is limited 
to only a few kinds of cells in eukaryotes. Telomerase is 
active in germ-line cells, where it functions to ensure that 
gametes pass on full-length chromosomes. Telo merase 
activity is also detected in some stem cells, thus enabling 
the cells that differentiate from those stem cells to have 
full-length chromosomes. In contrast, telomerase activity 
is virtually nonexistent in differentiated somatic cells, the 
kinds of cells that have finite life spans and make up nearly 
all the cells of most body organs and tissues. In somatic 
cells, genes responsible for producing telomerase are turned 
off, and almost no telomerase activity is detectable. This ac- 
counts for the finite life span of somatic cells in cell culture 
first observed in 1965 by Leonard Hayflick, who found that 
the number of cell divisions of cultured cells is dependent 
on the source of the cells. This limitation on the growth of 
most cells in culture is known as the Hayflick limit. 

The connection between telomerase inactivity and nor- 
mal aging of cells prompted geneticists to look at human 
premature aging conditions for evidence of mutations af- 
fecting telomere formation or telomerase activity. In the rare 
human condition dyskeratosis congenita (OMIM 305000), 
patients have abnormalities of skin and nails, occasional loss 
of vision and hearing, and abnormalities of blood cell pro- 
duction that are a frequent cause of death. The DKC1 gene 
responsible for dyskeratosis congenita affects the activity of 
genes responsible for normal telomerase function. Defective 
telomerase activity and shortened telomeres are thought to 
be at the root of dyskeratosis congenita. 

In contrast to the importance of telomerase activity 
for maintaining normal telomere length as chromosomes 


are passed through the germ line, what is the consequence 
of abnormal reactivation of telomerase activity in somatic 
cells? Such an event can lead aging cells to continue to 
proliferate, allowing them to escape programmed cell 
death by apoptosis. This is exactly what seems to happen 
in many kinds of cancer, where mutations reactivate the 
expression of TERT and reintroduce telomerase activity 
into cells where TERT is normally silent. 

Recent studies of gene expression in human cancer 
cells find that mutations reactivating TERT are among the 
most frequent mutations in cancers of all types. In cancers 
of the internal organs, including lung, breast, stomach, 
ovary, kidney, bladder, uterus, testis, and prostate, 78% to 
100% of advanced-cancer cells show evidence of reactiva- 
tion of telomerase activity. This is a highly significant in- 
crease over the 0% to 3% rate of telomerase reactivation in 
normal somatic cells. In the cancer cells, the reactivation of 
telomerase activity appears to stabilize telomere length, dis- 
rupting the normal program of progressive telomere short- 
ening that would lead to apoptosis. This extended life span 
may allow affected cells to acquire additional mutations as- 
sociated with cancer development and cancer advancement. 


7.5 Molecular Genetic Analytical 
Methods Make Use of DNA Replication 
Processes 


Molecular biologists have used their understanding of the 
enzymes and processes of DNA replication to develop 
new laboratory methods of molecular genetic analysis. 
Two widely used methods that developed directly from 
this knowledge are the polymerase chain reaction (PCR) 
and dideoxyribonucleotide DNA sequencing. In this sec- 
tion, we look at both of these methods and at their use in 
deciphering DNA variation. 


The Polymerase Chain Reaction 


The polymerase chain reaction (PCR) is an automated 
version of DNA replication that takes place in a test tube 
containing a total reaction volume of 20 to 50 microliters. 
(One microliter is one-millionth of a liter.) Despite its 
very small total reaction volume, a typical PCR reaction 
produces millions of copies of a short, targeted segment of 
DNA from the original DNA molecule. The almost limit- 
less uses of PCR in modern biological research include the 
collection of DNA from extinct species for evolutionary 
study; comparison of DNA among living species; forensic 
genetic applications such as paternity testing, crime scene 
analysis, and individual identification; and production of 
DNA segments for genome sequencing projects. 
Polymerase chain reactions are in vitro DNA- 
replication reactions performed using double-stranded 
DNA containing the target sequence that is to be copied, 
a supply of the four DNA nucleotides, a heat-stable DNA 
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polymerase, and two different single-stranded DNA prim- 
ers (described below). These PCR components are mixed 
with a buffer solution at the beginning of the reaction, 
and the reaction is repeated through a series of 30 to 
35 “cycles.” During each cycle, the number of copies of 
the target DNA sequence region doubles. This doubling 
process is known as “amplification,” and it is common to 
speak of “PCR amplification” in reference to the process 
and of “amplified DNA” as the product of the reaction. 

The DNA polymerase most often used in PCR is 
called Taq polymerase, named after the thermophilic bac- 
terial species Thermus aquaticus that was first collected 
in Yellowstone National Park. This bacterium lives in hot 
springs at near-boiling conditions, having evolved heat- 
stable proteins that remain active at these temperatures. 
The heat stability of Taq DNA polymerase is important to 
the efficiency of PCR. The first sample of Thermus aquiticus 
was collected from hot springs in Yellowstone national Park 
by Thomas Brock and Louise Brock in 1965. Brock was a 
microbiologist and his attention was drawn to some brown 
scum in the hot spring that looked something like the inset 
image in the opener photo for this chapter. Brock thought 
the scum looked like bacteria that live in other bodies of 
water, so he transported a sample back to his laboratory and 
managed to grow it. What he discovered was a new bacte- 
rial species and in the process he opened new avenues of 
research on “extremophiles“—organisms that live in extreme 
environments—and he helped pave the way for the use of 
Taq polymerase in PCR. 

As useful as Taq polymerase has been, there are 
now even more efficient polymerases for PCR derived 
from thermophilic (heat-loving) archaeal species. DNA 
polymerases from Pyrococcus furiosus and Thermococcus 
kodakaraensis are more efficient than Tao polymerase, 
having about 20-fold lower error rates due to their supe- 
rior proofreading capabilities. 

The PCR reaction itself closely resembles DNA repli- 
cation as we describe it in this chapter. It does, however, 
differ somewhat from cellular DNA replication by using 
two different, short, single-stranded DNA sequences called 
PCR primers to provide start points for Taq polymerase 
synthesis. PCR primers, like RNA primers in cellular rep- 
lication, are generally 12 to 24 nucleotides in length. One 
single-stranded primer binds to each of the DNA strands 
that serve as templates in PCR amplification. Importantly, 
the primers also bind on opposite sides of the region of 
DNA to be copied in PCR. The primer binding sites are at 
the 5’ and 3’ boundaries of most of the replication prod- 
ucts that will eventually be produced in the PCR reaction. 

Each polymerase chain reaction cycle is a three-step 
DNA replication reaction (Figure 7.26). Each step of a PCR 
cycle lasts from 30 seconds to several minutes, and 30 to 
36 is a typical number of cycles. Each complete PCR cycle 
doubles the number of copies of the target DNA sequence, 
so beginning with a single copy of double-stranded target 
sequence, completing the first PCR cycle produces 2 copies 
of the target sequence, two cycles produces 4 copies, three 
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Figure 7.26 Polymerase chain reaction (PCR). (a) The 
three-step cycle of PCR. (b) Amplification doubles the number 
of copies of the targeted DNA sequence each cycle. 


cycles 8 copies, and so on. After completing 30 PCR cycles 
the yield is 2°°, or more than 1 billion copies of the target 
sequence, and completion of 36 cycles can yield more than 
68 billion copies of the target sequence. The steps of each 
PCR cycle are as follows: 


@ Denaturation. The reaction mixture is heated to 
approximately 95°C, causing double-stranded DNA 
to denature into single strands as the hydrogen 
bonds between complementary strands break down. 

© Primer annealing. The reaction temperature is reduced 
to between about 45°C and 68°C to allow primer an- 
nealing, the hybridization of the two primers to com- 
plementary sequences that bracket the target sequence. 
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© Primer extension. Raising the temperature of the reac- 
tion to 72°C allows primer extension, during which 
Taq DNA polymerase synthesizes DNA, beginning at 
the 3’ end of each primer and taking approximately 
1 minute for every 1000 bp synthesized. 


PCR has an enormous variety of applications, but 
it also has limitations, the most important of which are 
(1) the requirement of some knowledge of the sequences 
needed for primers and (2) that amplification products 
longer than 10 to 15 kb are difficult to produce. In most 
cases, the length limitations on PCR restrict its use to 
the study of selected DNA segments or individual genes. 
The requirement for primer sequence information can 
be satisfied by informed guesses about the sequences 
likely to occur at primer binding sites or by using primers 
from one species to amplify similar sequences in another 
species. For example, a biologist wanting to study DNA- 
sequence similarity between species could use a pair of 
primers that amplify a Drosophila gene to examine the 
human genome for a related gene. There may be one 
or more base-pair mismatches between the Drosophila 
primers and the human DNA sequences they bind to, but 
the mismatches need not prevent primer annealing if the 
temperature of the PCR reaction is lowered during step 2 
of the reaction. The lower temperature can increase the 
stability of hybridization of the primers and their target 
sequences enough to allow the former to prime the PCR 
amplification. 

The polymerase chain reaction makes it practical to 
obtain large quantities of DNA from a particular gene 
for molecular analysis. The PCR procedure usually takes 
place in small plastic tubes that are specially designed for 
this purpose. It has revolutionized many aspects of biol- 
ogy, such as molecular genetics, recombinant DNA anal- 
ysis, evolutionary genetics, and forensic genetic analysis, 
including crime scene and paternity testing of DNA. 


Separation of PCR Products 


The PCR process selectively amplifies only the fragment 
of DNA bounded by the two primers, and the fragment or 
fragments of DNA produced by amplification are highly 
concentrated. Gel electrophoresis is then used to sepa- 
rate those amplified fragments from the rest of the reac- 
tion mixture (see Chapter 10), after which they are easily 
visualized by staining with EtBr (ethidium bromide) due 
to their high concentration in the gel. The size of PCR 
products is measured in base pairs, and any variability 
in their length results from differences in the number 
of nucleotides between the two primer binding sites. 
These differences can be exploited in genetic analysis to 
identify alleles of amplified genes, particularly if alleles 
differ from one another by containing different numbers 
of base pairs. As an example, let’s look at an analysis of 
short repeating sequences of DNA that are frequently 
used as one kind of genetic marker. Known as a variable 


number tandem repeat (VNTR) and also known as short 
tandem repeats (STRs), this type of marker contains 
end-to-end repeating DNA sequences that are each up 
to 20 bp in length. These types of genetic markers are the 
kind used in forensic genetic analysis where the goal is to 
match a crime scene DNA sample with that of a suspect 
or to identify paternity. 

Figure 7.27a shows four hypothetical VNTR alleles 
of a gene (V; to V4) that might be found in a population. 
The alleles differ in the number of repeats of the DNA 
sequence they carry. The repeats are consecutively num- 
bered in the figure. The PCR primers bind to the same 
sequences for each allele. The primers bind outside the 
repeat region, so amplification of each allele produces 
a DNA fragment of a characteristic length that is deter- 
mined by the number of DNA repeats the allele contains. 

Because here are four alleles for this VNTR gene, 
there are 10 possible genotypes. In Figure 7.27b, gel 
electrophoresis of PCR-amplified DNA fragment bands 
shows that each genotype has a distinctive band number 
and composition. Each homozygous genotype has a single 
band and each heterozygous genotype has two bands. The 
bands are identified by their repeat number. 

The inheritance of the VNTR alleles follows a 
codominant pattern in which both alleles are detected 
in heterozygous genotypes. In the family represented in 
Figure 7.27c, each parent transmits one allele to each child 
and as a consequence of the different heterozygous geno- 
types of the parents, each allele in a child can be traced to 
one of the parents. Notice that there are two DNA bands 
for each each homozygous person and two bands for each 
heterozygous person. VNTRs and other similar DNA genetic 
markers display codominant inheritance (see Section 4.1). 


Dideoxynucleotide DNA Sequencing 


The ultimate description of any DNA molecule is its 
sequence of bases. Applied at the genome level, DNA 
sequence information can include the whole genome— 
that is, all coding and regulatory sequences of genes, as 
well as all the other DNA sequence, including repetitive 
sequences, that make up the genome. Genomic sequence 
information can also be more limited, most commonly 
including only those portions of the genome that are tran- 
scribed into RNA. We discuss approaches to creating and 
analyzing genomic sequence data in Chapters 17 and 18. 

DNA sequencing technology has also found broad ap- 
plication in agriculture, medicine, and evolutionary biology. 
DNA sequencing technologies have changed rapidly as lab- 
oratory and computer technology have combined to make 
sequencing faster and cheaper by orders of magnitude. 

The first DNA-sequencing protocols were developed 
in 1977, one by Allan Maxam and Walter Gilbert and an- 
other by Fred Sanger. Of the two methods, Sanger’s was 
more amenable to automation, and it is the basis for the 
high-throughput approach to genome sequencing that is the 
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Figure 7.27 PCR amplification of variable number 
tandem repeat (VNTR) alleles. (a) Four VNTR alleles (V; 
to V,) are characterized by different numbers of identi- 

cal DNA repeat sequences. (b) Ten genotypes are possible 
for the VNTR gene, each having a unique pattern of PCR- 
fragment sizes. One band is seen for each homozygous 
genotype and two bands for each heterozygous genotype. 
(c) Hereditary transmission of VNTR alleles follows a 
codominant pattern. 


method of choice today. Here we first describe Sanger’s di- 
deoxynucleotide DNA sequencing method, and then we de- 
scribe the newest generation of automated DNA sequencing, 
commonly identified as next-generation DNA sequencing. 

Dideoxynucleotide DNA sequencing—also called di- 
deoxy DNA sequencing, or Sanger sequencing—is Sanger’s 
DNA sequencing method. Based on cellular DNA replica- 
tion reactions, dideoxy sequencing uses DNA polymerase 
to replicate new DNA from a single-stranded template (the 
strand to be sequenced) beginning at a primer sequence 
attached to the template strand. In dideoxy sequencing 
reactions, the four standard deoxynucleotide (dNTP) com- 
ponents of DNA, in large amounts, are mixed with smaller 
amounts of a dideoxynucleotide triphosphate (ddNTP). 

Dideoxynucleotides differ from deoxynucleotides in 
lacking two oxygen atoms (dideoxy means “two deoxygen- 
ated sites”) rather than the usual one deoxygenated site. 
Whereas dNTPs are deoxygenated at the 2’ carbon and 
have a hydroxyl group (OH) at the 3’ carbon, ddNTPs 
have hydrogen (H) atoms rather than hydroxyl groups 
at the 2’ and 3' carbons (Figure 7.28a). The absence of a 
hydroxyl group at the 3’ carbon in ddNTP prevents the 
ddNTP from forming a phosphodiester bond to elon- 
gate a DNA strand. Incorporation of a ddNTP by DNA 
polymerase into a growing strand is a chain-terminating 
event that blocks further strand elongation (Figure 7.28b). 
Dideoxy sequencing therefore produces a large number of 
partial replication products, each terminated by incorpo- 
ration of addNTP at a different site in the sequence. 

In preparation for dideoxy sequencing, many cop- 
ies of the DNA fragment to be sequenced are obtained 
in single-stranded form, usually by denaturing double- 
stranded DNA. Samples of the fragment are then placed 
in four parallel replication reactions. Each reaction mix- 
ture contains the DNA strand to be sequenced, a single- 
stranded DNA primer, DNA polymerase, large amounts of 
each of the four standard nucleotides (dATP, dGTP, dCTP, 
and dTTP), and a small amount of one dideoxynucleotide, 
either that of adenine (ddATP), thymine (ddTTP), cytosine 
(ddCTP), or guanine (ddGTP). 

The four parallel DNA-sequencing reactions shown in 
Figure 7.29 are used to sequence the DNA fragment shown 
at the top of the figure. As each reaction begins, a single- 
stranded 18-mer primer binds to template DNA. Using 
the five nucleotides available in each reaction, DNA poly- 
merase replicates the DNA fragment by adding nucleotides 
beginning at the 3’-OH end of the primer. The primers 
used in dideoxy sequencing are labeled with either radioac- 
tive phosphorus (?P) or with a fluorescent label on their 5' 
ends to facilitate detection of the DNA fragments produced 
in the sequencing reaction. In Figure 7.29a, showing the 
ddCTP-containing reaction, DNA synthesis from a tem- 
plate strand progresses until it reaches the first guanine on 
the template strand. At this point, the reaction can incor- 
porate one of two different kinds of cytosine. If the normal 
dCTP is incorporated, as it is in most cases due to its high 
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Figure 7.28 Nucleotides used in DNA sequencing reac- 
tions. (a) Dideoxynucleotides (ddNTPs) are deoxygenated at 
both the 2’ and 3’ carbons and cannot be used to elongate 
DNA. (b) The incorporation of a dideoxynucleotide of cytosine 
(ddCTP) terminates the replication reaction. 


concentration, the replication reaction will proceed. If, on 
the other hand, the reaction incorporates ddCTP, which 
will happen in fewer cases due to its lower concentration, 
the replication reaction terminates. Each time the template 
strand nucleotide is a guanine, a few replicating fragments 


(a) ddCTP reaction ("C" lane) Incorporation of dCTP allows 


the chain to continue growing, 
but incorporation of ddCTP 
terminates chain elongation. 


37. ee 
by Primer (18-mer)| 


Length of 
synthesized 
fragment 


23 by 18-mer [NNEC 

25 5S 18-mer [SETAC 

28 by 18-mer C : 

31 5 18-mer AATGCGCTGCATY : 

36 5 BERRA AT GCGCTGCATCGIAGY 
Partial replication products terminate at 


each cytosine of the chain due to the 
incorporation of ddCTP. 


(b) ddGTP reaction ("G” lane) 


Length of Partial 
synthesized replication 
fragment products 
22 5 18-mer AATE 
24 by 18-mer EWG 


27 S BE sucaAATGCGCTEE 


32 5 MESA ATGCGCTGCATCH 
35 5 MESA ATGCGCTGCATCGTAM 


(c) ddTTP reaction ("T” lane) 


Length of Partial 
synthesized replication 
fragment products 
21 e 18-mer LENTI 
26 5 18-mer AATGCGCH 
30 5S TRESEMAATGCGCTGCAR 
33 bi 18-mer AATGCGCTGCATCGH 
38 5 18-mer LASETTI A T | 
(d) ddATP reaction ("A” lane) 
Length of Partial 
synthesized replication 
fragment products 
19 By 18-mer A] 
20 By 18-mer INA 
29 5S TRERMEMAATGCGCTGCH 
34 5 MEA ATGCGCTGCATCGIN 
38 5S Babee AATGCGCTGCATCGTAGCTA 


Figure 7.29 DNA sequencing reactions. (a) A target region of 
DNA is located by binding a single-stranded primer of 18 nucleo- 
tides (an “18-mer”) that carries a 5’ label. Replication products 
terminated by ddCTP each have a different length. (b) Replication 
products terminated by ddGTP. (c) Termination products gener- 
ated by ddTTP. (d) Termination products generated by ddATP. 


incorporate ddCTP and terminate the reaction. Most reac- 
tions incorporate dCTP and continue replication. Some 
of these longer fragments will incorporate ddCTP at the 
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next opportunity and stop replicating, while most others 
incorporate dCTP and continue replication. Replication 
proceeds this way, halting in a few fragments each time a 
G appears on the template strand and a C is incorporated 
into the newly synthesized fragment. The result from this 
reaction is a series of partially replicated fragments whose 
replication is halted at each site of C incorporation. 

The three other reaction mixtures, containing ddGTP, 
ddTTP, and ddATP, likewise produce a series of partial 
replication products that all end with their particular 
ddNTP (Figure 7.29b-d). Upon the completion of the 
four parallel sequencing reactions, partial replication DNA 
products will occur for every nucleotide in the template. 

After the replication reactions are complete, the con- 
tents of each reaction are loaded into separate lanes of 
a DNA electrophoresis gel. Following completion of gel 
electrophoresis, the DNA sequence can be determined by 
examining the different-sized replication products spread 
across the four gel lanes. The bands shown in Figure 7.30a 
are visible in an autoradiograph because the primers that 
begin each fragment are end-labeled with 3P. The shortest 
fragment seen is in the A lane at the bottom, indicating that 
the first ddNTP nucleotide added to the 3’ end of the primer 
was ddATP. The second-shortest fragment is also in the 
A lane, indicating that chains to which ddATP was added 
in the second position terminated elongation there. The 
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Figure 7.30 Interpretation of a DNA sequencing gel. 

(a) Replication of each fragment terminates with the addition 

of a ddNTP. Nucleotides of the newly synthesized “sequenced 
strand” are read off the autoradiograph, and the 5’-to-3’ polarity 
of the strand corresponds to the smaller-to-larger fragment- 
length direction. The “inferred strand” is the template strand, and 
it is complementary and antiparallel to the “sequenced strand.” 
(b) A photograph of a dideoxy sequencing gel. 


third-shortest fragment in the gel is in the T lane, and the 
fourth-shortest in this example is in the G lane. So far, the 
sequence of nucleotides in the synthesized DNA is AATG. 

By continuation of this analytical process, the DNA se- 
quence of the synthesized strand is “read” from the gel in the 
5'-to-3' direction (the direction in which a replicating strand 
elongates), as demonstrated in Figure 7.30a. The “inferred 
strand” is the template strand, which is complementary and 
antiparallel to the sequenced strand. Figure 7.30b shows an 
autoradiograph of a dideoxysequencing gel and shows a por- 
tion of the sequence read near the middle of the gel at the left. 

Manual dideoxy sequencing, as described above, is a 
labor-intensive process that today has been largely sup- 
planted by high-throughput, automated DNA sequencing 
and powerful computational software and hardware that 
can run 24 hours a day, 365 days a year, and assemble 
genomic sequence at the rate of 10,000 to 20,000 bp per 
hour! Genetic Analysis 7.3 tests your skills at interpreting 
dideoxy sequencing results. 


New DNA-Sequencing Technologies: Next 
Generation and Third Generation 


New generations of DNA-sequencing technologies are 
continuing to be developed. So-called next-generation 
sequencing technology ascertains the sequence of a single 
strand of DNA by synthesizing a complementary strand 
and detecting which nucleotide is added at each step. 

To begin the procedure, the sample to be sequenced is 
broken into double-stranded fragments, and then the frag- 
ments are denatured and their individual single strands of 
DNA are captured and immobilized on beads. The beads, 
each bearing a single DNA strand, are placed in wells of an 
electrophoretic gel, where single-stranded DNA linkers are 
added and bind to one end of the DNA fragments. Next, 
PCR primers complementary to the linkers are added to 
serve as the starting points of PCR amplification. 

PCR amplification is accomplished by sequentially 
flooding the wells with solutions containing the four nucleo- 
tides A, T, C, and G. The nucleotides are tagged with a mol- 
ecule that emits light at a specific wavelength, furnishing a 
means of indicating that the nucleotide has been added to 
a new strand in the PCR reaction. A photo receptor detects 
the light and sends a signal through computer software to 
generate a profile of the order in which nucleotides are in- 
corporated during synthesis (Figure 7.31). In this manner, 


AATG@CG@CTGCATCGTACCTA 


Figure 7.31 Next-generation sequencing output. Labels on 
nucleotides incorporated into newly synthesized DNA are excited 
and their emissions captured in next-generation sequencing. 


GENETIC ANALYSIS 


PROBLEM From the dideoxy DNA sequencing gel shown below, deduce the sequence and strand 
polarities of the DNA duplex fragment. 


BREAK IT DOWN: Chain termination, 
caused by the incorporation of a dideoxynucleotide, 


ddATP_ddGTP_ddTTP_ddCTP 


produces the partially replicated DNA fragments 
detected in a DNA sequencing gel (p. 259). 


® 
Solution Strategies Solution Steps 
Evaluate 
1. Identify the topic this problem 1. This question concerns dideoxynucleotide DNA sequencing. The answer 
addresses and the nature of the requires interpretation of a DNA sequencing gel to determine the double- 
required answer. stranded sequence of a fragment of DNA, including strand polarities. 
2. Identify the critical information 2. A dideoxynucleotide DNA sequencing gel is shown. 
given in the problem. 
Deduce 
3. Review the essential steps of dideoxy- 3. DNA polymerase incorporates nucleotides in four parallel reactions. Each reac- 
nucleotide DNA sequencing. tion mixture includes the four normal DNA nucleotides (dNTPs) and one labeled 
dideoxynucleotide (ddNTP). Incorporation of a dNTP allows continued strand 
synthesis, but incorporation of a ddNTP terminates synthesis. 
Examine the gel and identify the 4. The 3’ end of the primer is used to initiate DNA synthesis. The first nucleotide 
“beginning” of DNA synthesis. incorporated during synthesis is cytosine, as determined by identifying the 
TIP: DNA fragments toward the bottom of the gel location of the smallest synthesized fragment: the “C” lane. The second and 
(nearer the positive pole) are shorter than fragments third nucleotides are both adenine. The first three nucleotides are therefore 
higher up in the gel. The sequence of the synthesized 5'-CAA-3' 
strand shown in the gel is 5’ at the bottom and 3’ g 
at the top. 
Solve 
5. Write the rest of the sequence (along 5. The synthesized strand is 
with the polarity) of the synthesized 5'- [primer] - CAATAGCTGAGGAGTCGATTCATGCCGATA- 3’. 


strand shown in the gel. 


6. Determine the sequence and polarity 6. The template DNA strand is 


of the template strand used for DNA 3/- GITATCGACTCCTCAGCTAAGTACGGCTAT- 5’. 
synthesis. 


For more practice, see Problems 28, 29, 30, and 34. Visit the Study Area to access study tools. MasteringGenetics™ 
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next-generation sequencing identifies the sequence of a 
DNA strand “by synthesis” rather than by chain termination, 
as is the case with dideoxy sequencing. 

One major advance of next-generation sequencing tech- 
nologies is that thousands to millions of sequencing reac- 
tions are run simultaneously, producing orders of magnitude 
more sequence information than dideoxy sequencing. As a 
result, next-generation sequencing is often referred to as be- 
ing “massively parallel” or “high throughput” in its approach. 
Another advantage of next-generation sequencing over dide- 
oxy sequencing is that DNA can be present as a single copy 
rather than the large number of copies of the strand to be 
sequenced that is needed for dideoxy sequencing. 

Eliminating the need to have large numbers of cop- 
ies in order to sequence the DNA has two significant 
advantages. First, it facilitates the sequencing of DNA 
samples that are found in only trace amounts, such as 
the small amounts of DNA obtained from the Neandertal 
and Denisovan bone samples described in the Case Study 
in Chapter 1 (pp. 21-22) and in Chapter 22 or the scant 
DNA samples obtained from the frozen remains of a 
wooly mammoth preserved in permafrost in Siberia. Next- 
generation sequencing is powerful enough to distinguish 
mammoth DNA from DNA of environmental contami- 
nants, such as grasses in existence at the time the mam- 
moth died that are also preserved in the permafrost. The 
second advantage of next-generation sequencing is that it 
excels over the earlier methods at sequencing DNA that 
is highly repetitive. On the other hand, next-generation 
methods have the disadvantage of producing sequence 
segments of only 20 to 500 bases versus the 800 to 1000 
bases sequenced by dideoxy sequencing methods. 

Currently being developed are newer procedures de- 
scribed as “third-generation” DNA sequencing technolo- 
gies. These offer the possibility of sequencing millions of 
single copies of DNA molecules directly and in parallel. 
The combination of next-generation and third-generation 
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sequencing technologies is causing, and will continue to 
cause, the price of sequencing to plummet. In 2001, when 
the final draft of the human genome was completed, the 
cost of sequencing 1 million base pairs by dideoxy se- 
quencing was approximately $10,000. By 2005, the cost 
had been cut to approximately $1000 per million base 
pairs. By 2010, using third-generation sequencing, the 
cost of sequencing 1 million bases pairs was approxi- 
mately $1. 

The reduction in the cost of sequencing has led to 
an explosion of sequences available in public databases. 
Consider, for example, that about 10 billion base pairs 
were available in public databases in 2000, but by 2010 the 
number was more than 300 billion base pairs. A stated goal 
of modern genomic science is to produce the complete 
genome sequence of a person for less than $1000—the 
so-called “thousand-dollar genome”’—by 2020. When this 
becomes feasible, it may be routine for your own genome 
sequence to be part of your medical file and for decisions 
about your personal disease treatment, disease prevention, 
and health monitoring to be made on the basis of your indi- 
vidual genome. 

These new medical possibilities raise some unprec- 
edented social and ethical questions. From the earliest 
days of the development of recombinant DNA technolo- 
gies in the early 1970s through to the present day, the 
potential social, ethical, environmental, and economic 
issues engendered by the technology have been the sub- 
ject of intense debate. In 1975, following a self-imposed 
moratorium on recombinant DNA research, scientists 
met at the Asilomar Institute in California to draw up a 
set of guidelines addressing many of the safety concerns 
expressed by scientists and members of the public. A new 
array of issues raised by the dawn of the era of personal 
genome sequencing, including questions of confidential- 
ity, potential bias, and personal choice, will need to be 
addressed by similar public debates. 


Use of PCR and DNA Sequencing to Analyze Huntington Disease Mutations 


Both PCR and DNA sequencing analysis have been used to 
study the gene identified as HD that is mutated in Huntington 
disease (OMIM 143100). HD encodes the huntingtin protein 
that is expressed in brain cells and in other cells of the body. 
The normal function of wild-type huntingtin is not known, 
but it interacts with dozens of other proteins. In mutant form, 
huntingtin appears to aggregate with itself and other pro- 
teins, hastening the death of neurons in the brain that lead to 
the motor abnormalities—progressive loss of motor control 
by unintentional and uncontrollable movement—that are 
characteristic of the disease. 


TRINUCLEOTIDE REPEAT EXPANSION Huntington dis- 
ease is one of several human trinucleotide repeat expansion 
disorders that are caused by increases in the length of gene 
sections containing end-to-end repeats of three nucleotides. 
A CAG trinucleotide region of HD that encodes the amino acid 
glutamine produces a polyglutamine tract in the wild-type al- 
lele. The length of the polyglutamine tract is increased in mu- 
tant huntingtin protein as a result of an increased number of 
CAG repeats in mutant alleles. 

Regions of repeating DNA sequence, such as those con- 
taining many repeats of DNA triplets, are known as “hotspots” 
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Figure 7.32 Dideoxy DNA sequencing of the HD gene. 

Gel electrophoresis results of dideoxy sequencing of a wild-type 
HD allele with 21 CAG repeats is compared to the results for an 
HD allele with 48 CAG repeats. 


of mutation, regions that undergo a greater than average 
number of mutations. One common mechanism of mutation 
in regions of repeating DNA sequence is so-called strand slip- 
page. We discuss this mutational process in more detail in 
Section 12.3. For now, simply know that DNA polymerase can 
occasionally slip backward during the replication of repetitive 
DNA, so that it erroneously copies a segment of sequence 


twice. The result of this slippage is an increase in the number 
of nucleotides in a region of repeating DNA sequence. While 
this happens occasionally, it rarely causes a problem because 
most repetitive DNA is not transcribed and no abnormal RNAs 
are produced. A few regions of repetitive DNA sequence, like 
this CAG repeat region, are transcribed, however, and their 
expansion can cause a mutation. 


CAG REPEAT NUMBERS IN WILD-TYPE AND MUTANT 
HD ALLELES Wild-type HD genes vary in the number of 
CAG repeats, ranging from 6 to 28 repeats in the general 
population. HD alleles with 28 to 35 CAG repeats do not cause 
disease, but as a consequence of the increased CAG number, 
the alleles are unstable and prone to further expansion. Alleles 
that have 36 to 40 CAG repeats have expanded beyond the 
normal range, and the huntingtin protein produced by these 
alleles can behave abnormally and can result in disease symp- 
toms that show reduced penetrance. Individuals who carry 
36 to 40 CAG repeats might or might not develop HD. If they 
do, disease symptoms have a late age of onset and progress 
slowly. Individuals with HD alleles containing more than 40 
CAG repeats have HD that can develop at any time from the 
late teens onward. Figure 7.32 shows dideoxy DNA sequenc- 
ing analysis of the CAG repeat segment of the HD gene for a 
wild-type allele with 21 CAG repeats and for a mutant allele 
with 48 CAG repeats. 


POLYMERASE CHAIN REACTION DETECTS THE NUM- 
BER OF REPEATS The polymerase chain reaction provides 
another way of visualizing the CAG triplet repeat expansion 
and of following the transmission of alleles in the families of 
people with HD. Employing primers that bind on opposite 
sides of the CAG repeat region, researchers amplify frag- 
ments of DNA by PCR and separate them by gel electropho- 
resis. The binding sites of the PCR primers are identical for all 
alleles, but differences are seen in the lengths of amplified 
PCR products because of different numbers of CAG repeats 
between the primer binding sites. Amplified DNA fragments 
containing the primers are shorter if they are generated from 
wild-type DNA sequences than from mutant alleles, because 
wild-type alleles have a smaller number of repeats than do 
mutant al leles. In the Huntington disease family shown in 
Figure 7.33, each person with HD is heterozygous and car- 
ries one wild-type allele with fewer than 36 repeats of the 
CAG sequence and one expanded allele with more than 
36 repeats. In contrast, family members shown here who do 
not have HD carry two alleles that each contain fewer than 
36 CAG repeats. 


PRESYMPTOMATIC MOLECULAR DIAGNOSIS OF HD 
These and similar molecular methods are used to assess the 
number of CAG repeats in HD for presymptomatic genetic test- 
ing of people at risk for inheriting Huntington disease. At-risk 
individuals can be tested before disease symptoms appear and 
can be told whether they carry an expanded HD allele. These 
methods can also be used to identify the presence of a CAG 
expansion of HD in individuals diagnosed with Huntington dis- 
ease by clinicians. 
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Figure 7.33 CAG expansion of the 
HD gene detected by Southern blot 


analysis of PCR-amplified DNA. Each 
family member represented by a 
filled circle or square has Huntington 
disease. PCR analysis of the HD gene 
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7.1 DNA Is the Hereditary Molecule of Life 


| F. Griffith determined in 1928 that a molecular transforma- 
tion factor was responsible for transformation of living R 
bacteria into an S form. 
In 1944, O. Avery, C. MacLeod, and M. McCarty’s study of 
in vitro transformation caused by an S-cell extract identified 
DNA as the transformation factor and strongly suggested it 
is the hereditary material. 

E A. Hershey and M. Chase determined in 1952 that bacterio- 
phage T2 uses DNA, not protein, to reproduce within host 
E. coli cells. 


7.2 The DNA Double Helix Consists of Two 
Complementary and Antiparallel Strands 


I The DNA nucleotides consist of the five-carbon sugar 
deoxyribose, a phosphate group, and one of four nitrogen- 
containing nucleotide bases. 

E The DNA nucleotide bases are the purines adenine and 
guanine, and the pyrimidines cytosine and thymine. 
Phosphodiester bonds form between 5’ phosphate and 3’ 
OH groups to join nucleotides into polynucleotide chains. 

E Complementary base pairs consist of a purine and a pyrimi- 
dine. In DNA, A and T form two stable hydrogen bonds, 
whereas G and C form three stable hydrogen bonds. 

E Complementary nucleic acid strands are antiparallel. 

The stacking of base pairs in DNA imparts helical twisting 

that creates major grooves and minor grooves in the duplex. 


7.3 DNA Replication Is Semiconservative 
and Bidirectional 


Experimental evidence demonstrates that DNA replication is 
semiconservative, meaning each daughter molecule receives 
one parental strand and one newly synthesized strand that 
was produced using the parental strand as a template. 

E Most DNA replication is bidirectional. A replication bubble with 
replication forks at each end expands as replication progresses. 

E Bacterial genomes have a single replication origin, whereas 
eukaryotic genomes have many origins of replication. 


E Eukaryotic replication origins initiate asynchronously during 
S phase. 


E Eukaryotic DNA replication produces sister chromatids. 


7.4 DNA Replication Precisely Duplicates 
the Genetic Material 


| Bacterial, archaeal and yeast DNA replication begins at 
specific locations that bind replication initiation proteins. 
Specific conserved sequences are found in bacteria, but repli- 
cation initiation is directed by chromatin state in eukaryotes. 

| DNA replication begins with the synthesis of an RNA primer 
by primase, followed by synthesis of leading and lagging 
DNA strands by DNA polymerase. 

| To complete replication, RNA primers are removed by DNA 

polymerase, and DNA segments are joined by DNA ligase. 

DNA polymerases not only replicate DNA but also proof- 

read newly synthesized DNA for accuracy. 

f Eukaryotic and archaeal DNA replication proteins have 
a high degree of homology reflecting a shared common 
ancestry. Bacteria have analogous proteins, but are 
ancestrally more distant. 

1 Eukaryotic chromosomes have repetitive sequences called 
telomeres at their ends that shorten with each replication in 
somatic cell cycles. 

| Telomerase is a ribonucleoprotein that synthesizes telomeric 
repeat sequences to maintain telomere length in germ-line 
and stem cells. 


7.5 Molecular Genetic Analytical Methods Make 
Use of DNA Replication Processes 


1 The polymerase chain reaction (PCR) is used to produce 
large numbers of copies of target DNA sequences. 

| Dideoxynucleotide DNA sequencing is used to determine 
the sequence of DNA fragments. 

| Next-generation and third-generation DNA sequencing are 
much faster and far cheaper methods that have paved the 
way for large numbers of genome sequencing projects and 
personal human genome sequencing. 
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1. 


What results from the experiments of Frederick 
Griffith provided the strongest support for his conclu- 
sion that a transformation factor is responsible for 
heredity? 

Explain why Avery, MacLeod, and McCarty’s in vitro 
transformation experiment showed that DNA, but not 
RNA or protein, is the hereditary molecule. 


Hershey and Chase selected the bacteriophage T2 for their 
experiment assessing the role of DNA in heredity because 
T2 contains protein and DNA, but not RNA. Explain why 
T2 was a good choice for this experiment. 


Explain how the Hershey and Chase experiment identified 
DNA as the hereditary molecule. 


One strand of a fragment of duplex DNA has the sequence 
5'- ATCGACCTGATC- 3’. 


a. What is the sequence of the other strand in the duplex? 

b. What is the name of the bond that joins one nucleotide 
to another in the DNA strand? 

c. Is the bond in part (b) a covalent or a noncovalent bond? 

d. Which chemical groups of nucleotides react to form the 
bond in part (b)? 

e. What enzymes catalyze the reaction in part (d)? 

f. Identify the bond that joins one strand of a DNA duplex 
to the other strand. 

g. Is the bond in part (f) a covalent or a noncovalent bond? 
h. What term is used to describe the pattern of base pairing 
between one DNA strand and its partner in a duplex? 

i. What term is used to describe the polarity of two DNA 

strands in a duplex? 


The principles of complementary base pairing and antipar- 
allel polarity of nucleic acid strands in a duplex are univer- 
sal for the formation of nucleic acid duplexes. What is the 
chemical basis for this universality? 


7. 


10. 


11. 


For the following fragment of DNA, determine the num- 
ber of hydrogen bonds and the number of phosphodiester 
bonds present: 

5'- ACGTAGAGTGCTC- 3’ 

3'- TGCATCTCACGAG- 5’ 


Figures 1.6 and 1.7 present simplified depictions of nucleo- 

tides containing deoxyribose, a nucleotide base, and a 

phosphate group (see pages 8 and 9). Use this simplified 

method of representation to illustrate the sequence 

3'- AGTCGAT - 5’ and its complementary partner in a 

DNA duplex. 

a. What kind of bond joins the C to the G within a single 
strand? 

b. What kind of bonds join the C in one strand to the G in 
the complementary strand? 

c. How many phosphodiester bonds are present in this 
DNA duplex? 

d. How many hydrogen bonds are present in this DNA 
duplex? 


Consider the sequence 3’- ACGCTACGTC- 5’. 

a. What is the double-stranded sequence? 

b. What is the total number of covalent bonds joining the 
nucleotides in each strand? 

c. What is the total number of noncovalent bonds joining 
the nucleotides of the complementary strands? 


DNA polymerase III is the main DNA-synthesizing enzyme 
in bacteria. Describe how it carries out its role of elongat- 
ing a strand of DNA. 


You are participating in a study group preparing for an 
upcoming genetics exam, and one member of the group 
proposes that each of you draw the structure of two DNA 
nucleotides joined in a single strand. The figures are drawn 


12. 
13. 


14. 
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and exchanged for correction. You receive the drawing 
below to correct. 


a. Identify and correct at least five things that are wrong 
in the depiction of each nucleotide. 

b. What is wrong with the way the nucleotides are joined? 

c. Draw this single-stranded segment correctly. 


Explain how RNA participates in DNA replication. 


A sample of double-stranded DNA is found to contain 
20% cytosine. Determine the percentage of the three other 
DNA nucleotides in the sample. 


Bacterial DNA polymerase I and DNA polymerase III per- 
form different functions during DNA replication. 


a. Identify the principal functions of each molecule. 

b. If mutation inactivated DNA polymerase I in a strain of 
E. coli, would the cell be able to replicate its DNA? If so, 
what kind of abnormalities would you expect to find in 
the cell? 

c. Ifa strain of E. coli acquired a mutation that inactivated 
DNA polymerase III function, would the cell be able to 
replicate its DNA? Why or why not? 


Application and Integration 


20. 


21. 


22. 


23. 


24. 


Matthew Meselson and Franklin Stahl demonstrated that 
DNA replication is semiconservative in bacteria. Briefly 
outline their experiment and its results for two DNA rep- 
lication cycles, and identify how the alternative models of 
DNA replication were excluded by the data. 


Raymond Rodriguez and colleagues demonstrated con- 
clusively that DNA replication in E. coli is bidirectional. 
Explain why locating the origin of replication on one side 
of the circular chromosomes and the terminus of replica- 
tion on the opposite side of the chromosome supported 
this conclusion. 


Joel Huberman and Arthur Riggs used pulse labeling to ex- 
amine the replication of DNA in mam malian cells. Briefly de- 
scribe the Huberman-Riggs experiment, and identify how the 
results exclude a unidirectional model of DNA replication. 


Why do the genomes of eukaryotes, such as Drosophila, 
need to have multiple origins of replication, whereas bacte- 
rial genomes, such as that of E. coli, have only a single origin? 


Bloom syndrome (OMIM 210900) is an autosomal recessive 
disorder caused by mutation of a DNA helicase. Among 
the principal symptoms of the disease are chromosome 


15. 


16. 


17. 


18. 


25. 


26. 
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Diagram a replication fork in bacterial DNA and label the 
following structures or molecules. 


DNA pol III 

helicase 

RNA primer 

origin of replication 

leading strand (label its polarity) 
DNA pol I 

topoisomerase 

SSB protein 

lagging strand (label its polarity) 
primase 

. Okazaki fragment 


ee oo 


Which of the following equations are true for the percent- 
ages of nucleotides in double-stranded DNA? 
(A+G)/(C+T)=1.0 

(A+ T)/(G+C)=1.0 

(A)/(T) = (G)/(C) 

(A)/(C) = (G)/(T) 

(A)/(G) = (T)(C) 


Which of the following equalities is not true for double- 
stranded DNA? 


G 


a. (G+T)=(A+C) 
b. (G+C)=(A+T) 
c. (G+A)=(C+T) 


List the order in which the following proteins and enzymes 
are active in E. coli DNA replication: DNA pol I, SSB, 
ligase, helicase, DNA pol III, and primase. 


Two viral genomes are sequenced, and the following per- 
centages of nucleotides are identified: 


Genome 1: A= 28%, C = 22%, G = 28%, T = 22% 
Genome 2: A= 22%, C = 28%, G = 28%, T = 22% 


What is the structure of DNA in each genome? 


For answers to selected even-numbered problems, see Appendix: Answers. 


instability and a propensity to develop cancer. Explain these 
symptoms on the basis of the helicase mutation. 


How does rolling circle replication (see Section 6.1) differ 
from bidirectional replication? 


Telomeres are found at the ends of eukaryotic chromosomes. 


a. What is the sequence composition of telomeres? 

b. How does telomerase assemble telomeres? 

c. What is the functional role of telomeres? 

d. Why is telomerase usually active in germ-line cells but 
not in somatic cells? 


A family consisting of a mother (I-1), a father (I-2), and 
three children (II-1, II-2, and II-3) are genotyped by PCR 
for a region of an autosome containing repeats of a 10-bp 
sequence. The mother carries 16 repeats on one chromo- 
some and 21 on the homologous chromosome. The father 
carries repeat numbers of 18 and 26. 


a. Following the illustration style of Figure 7.27c, which 
aligns members of a pedigree with their DNA fragments 
in a gel, draw a DNA gel containing the PCR fragments 
generated by amplification of DNA from the parents 
(I-1 and 1-2). Label the size of each fragment. 
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28. 


29. 


30. 


31. 


32. 


33. 
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b. Identify all the possible genotypes of children of this cou- 
ple by specifying PCR fragment lengths in each genotype. 

c. What genetic term best describes the pattern of inheri- 
tance of this DNA marker? Explain your choice. 


In a dideoxy DNA sequencing experiment, four separate 
reactions are carried out to provide the replicated material 
for DNA sequencing gels. Reaction products are usually 
run in gel lanes labeled A, T, C, andG. 

a. Identify the nucleotides used in the dideoxy DNA se- 
quencing reaction that produces molecules for the A 
lane of the sequencing gel. 

b. How does PCR play a role in dideoxy DNA sequencing? 

c. Why is incorporation of a dideoxynucleotide during DNA 
sequencing identified as a “replication-terminating” event? 


The following dideoxy DNA sequencing gel is produced in 
a laboratory. 


ddATP_ddTTP_ddCTP ddGTP 
m m m m 


1) © Origin 
—_ 
—_ 
® 


What is the double-stranded DNA sequence of this mol- 
ecule? Label the polarity of each strand. 


Using an illustration style and labeling similar to that in 
Problem 29, draw the electrophoresis gel containing dide- 
oxy sequencing fragments for the DNA template strand 
3'- AGACGATAGCAT- 5’. 


A PCR reaction begins with one double-stranded segment 
of DNA. How many double-stranded copies of DNA are 
present after the completion of 10 amplification cycles? 
After 20 cycles? After 30 cycles? 


DNA replication in early Drosophila embryos occurs about 
every 5 minutes. The Drosophila genome contains approxi- 
mately 1.8 X 108 base pairs. Eukaryotic DNA polymerases 
synthesize DNA at a rate of approximately 40 nucleotides 
per second. Approximately how many origins of replica- 
tion are required for this rate of replication? 


Three independently assorting VNTR markers are used to 
assess the paternity of a colt (C) recently born to a quar- 
ter horse mare (M). Blood samples are drawn from the 
mare, her colt, and three possible male sires (S4, S2, and 
S3). DNA at each marker locus is amplified by PCR, anda 
DNA electrophoresis gel is run for each marker. Amplified 
DNA bands are visualized in each gel by ethidium bromide 
staining. Gel results are shown below for each marker. 


34. 


35. 
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Evaluate the data and determine if any of the potential sires 
can be excluded. Explain the basis of exclusion, if any, in 
each case. 


A sufficient amount of a small DNA fragment is available 
for dideoxy sequencing. The fragment to be sequenced con- 
tains 20 nucleotides following the site of primer binding: 


5'-ATCGCTCGACAGTGACTAGC- [primer site] -3’ 


Dideoxy sequencing is carried out, and the products of the 
four sequencing reactions are separated by gel electropho- 
resis. Draw the bands you expect will appear on the gel 
from each of the sequencing reactions. 


Suppose that future exploration of polar ice on Mars 
identifies a living microbe and that analysis indicates the 
organism carries double-stranded DNA as its genetic 
material. Suppose further that DNA replication analysis is 
performed by first growing the microbe in a growth medium 
containing the heavy isotope of nitrogen (1°N), that the 
organism is then transferred to a growth medium contain- 
ing the light isotope of nitrogen (‘N), and that the nitrogen 
composition of the DNA is examined by CsCl ultracentri- 
fugation and densitometry after the first, second, and third 
replication cycles in the N-containing medium. The results 
of the experiment are illustrated for each cycle. The control 
shows the positioning of the three possible DNA densities. 
Based on the results shown, what can you conclude about the 
mechanism of DNA replication in this organism? 

(Hint: See the description of the Meselson and Stahl 


experiment on pp. 236-237.) 
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An electron micrograph of a spliceosome engaged in intron splicing. 


VAT dea Johansson introduced the term gene in 1909 
to describe “the fundamental unit of inheritance.” 
Johansson’s definition encompasses the understanding 
that genes contain genetic information and are passed 
from one generation to the next and that genes are the 
basis of the fundamental structural, functional, devel- 
opmental, reproductive, and evolutionary properties of 
organisms. This basic definition of the gene remains valid 
today, more than a century after being coined, but our 
knowledge of molecular genetics has expanded enor- 
mously, refining our understanding of the structure and 
function of genes and clarifying the roles genes play in 
producing traits. 


CHAPTER OUTLINE 


8.1 RNA Transcripts Carry the 
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8.2 Bacterial Transcription 
Is a Four-Stage Process 

8.3 Archaeal and Eukaryotic 
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The central dogma of biology describes the flow 
of genetic information from DNA to RNA to protein 
(see Figure 1.8). It conveys that DNA is the repository 
of genetic information, which is converted through 
transcription into RNA, one type of which is then 
translated into protein. Transcription is the process 
by which RNA polymerase enzymes and other tran- 
scriptional proteins and enzymes use the template 
strand of DNA to synthesize a complementary RNA 
strand. Translation is the process by which messenger 
RNA is used to direct protein synthesis. 

This chapter describes the mechanisms of RNA 
transcription in the three domains of life: bacteria, 
archaea, and eukaryotes. We will also examine the 
events that modify the precursor messenger RNA 
(mRNA) to yield the mature mRNA that subsequently 
undergoes translation to produce proteins. We will 
see that these transcriptional events are closely 
tied to the process of translation, the subject of the 
following chapter. 

This chapter also discusses the shared evolu- 
tionary history and common ancestry of bacteria, 
archaea, and eukaryotes. We will see that, bacteria 
have a number of general features of transcription in 
common with archaea and eukaryotes. At the same 
time, we see that, differences among the members 
of these domains, including differences in cell struc- 
ture, gene structure, and genome organization, lead 
to significant differences in how their genes are tran- 
scribed and translated. 

Multiple types of RNA are introduced and 
described here, but the principal focus of discus- 
sion is MRNA. The discovery of mRNA and of its 
function raised numerous questions: How is a 
gene recognized by the transcription machinery? 
Where does transcription begin? Which strand 
of DNA is transcribed? Where does transcription 
end? How much transcript is made? How is RNA 
modified after transcription? We answer these 
questions in the chapter and set the answers in 
a context that compares and contrasts the pro- 
cess of transcription in bacterial, archaeal, and 
eukaryotic genomes. 


8.1 RNA Transcripts Carry the 
Messages of Genes 


In the late 1950s, with the structure of DNA in hand, 
molecular biology researchers focused on identifying and 
describing the molecules and mechanisms responsible 
for conveying the genetic message of DNA. RNA was 
known to be chemically similar to DNA and present in 
abundance in all cells, but its diversity and biological roles 
remained to be discovered. Some roles were strongly sug- 
gested by cell structure. For example, in eukaryotic cells, 
DNA is located in the nucleus, whereas protein synthesis 
takes place in the cytoplasm, suggesting that DNA could 
not code directly for proteins but RNA perhaps could. 
Bacteria, however, lack a nucleus, so an open research 
question was whether bacteria and eukaryotes used simi- 
lar mechanisms and similar molecules to convey the 
genetic message for protein synthesis. The search was on 
to identify the types of RNA in cells and to identify the 
mechanisms by which the genetic message of DNA is con- 
veyed for protein synthesis. 

It is worth noting that the experimental evidence 
identifying archaea as occupying a separate domain from 
bacteria and eukaryotes was obtained after some of the 
fundamental information about transcription became 
known. We introduce transcription in archaea in a later 
section. These microbes, which like bacteria also lack 
a nucleus, reveal an intriguing blend of bacterial and 
eukaryotic features. The archaeal core transcriptional 
proteins are clearly homologous to the eukaryotic ap- 
paratus, while the regulation of these processes is more 
bacteria-like in nature. 


RNA Nucleotides and Structure 


Both DNA and RNA are polynucleotide molecules 
composed of nucleotide building blocks. One principal 
difference between the molecules is the single-stranded 
structure of RNA versus the double-stranded structure of 
DNA. Despite their single-stranded structure, however, 
RNA molecules can, and frequently do, adopt folded 
secondary structures by complementary base pairing of 
segments of the molecule. In certain instances, folded 
secondary structures are essential to RNA function, as we 
discuss in the following section. 

The RNA nucleotides, like those of DNA, are com- 
posed of a five-carbon sugar, a nucleotide base, and one or 
more phosphate groups. Each RNA nucleotide carries one 
of four possible nucleotide bases. At the same time, RNA 
nucleotides have two critical chemical differences in com- 
parison to DNA nucleotides. The first difference concerns 
the identity of the RNA nucleotide bases. The purines 
adenine and guanine in RNA are identical to the purines 
in DNA. Likewise, the pyrimidine cytosine is identical in 
RNA and DNA. In RNA, however, the second pyrimidine 
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Figure 8.1 The four RNA ribonucleotides. Shown in their 
monophosphate forms, each ribonucleotide consists of the 
sugar ribose, one phosphate group, and one of the nucleotide 
bases adenine, guanine, cytosine, and uracil. 


is uracil (U) rather than the thymine carried by DNA. 
The four RNA ribonucleotides (A,U,G,C) are shown in 
Figure 8.1. The structure of uracil is similar to that of thy- 
mine, but notice, by comparing the structure of uracil in 
Figure 8.1 with that of thymine in Figure 7.5, that thymine 
has a methyl group (CHs) at the 5 carbon of the pyrimidine 
ring, whereas uracil does not. In all other respects, uracil is 
similar to thymine, and when uracil undergoes base pair- 
ing, its complementary partner is adenine. 

The second chemical difference between RNA and 
DNA nucleotides is the presence of the sugar ribose in 
RNA rather than the deoxyribose occurring in DNA. The 
ribose gives RNA its name (ribonucleic acid). Compare 
the ribose molecules shown in Figure 8.1 to deoxyribose 
in Figure 7.5, and notice that ribose carries a hydroxyl 
group (OH) not found in deoxyribose at the 2’ carbon 
of the ring. Except for this difference, ribose and deoxy- 
ribose are identical, having a nucleotide base attached to 
the 1’ carbon and a hydroxyl group at the 3’ carbon. 

The similarity of the sugars of RNA and DNA leads 
to the formation of essentially identical sugar-phosphate 
backbones in the molecules. RNA strands are assembled 
by formation of phosphodiester bonds, between the 5’ 
phosphate of one nucleotide and the 3’ hydroxyl of the 
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adjacent nucleotide, that are identical to those found 
in DNA (Figure 8.2). RNA is synthesized from a DNA 
template strand using the same purine-pyrimidine com- 
plementary base pairing described for DNA except for 
the pairing between adenine of DNA with uracil of RNA. 
RNA polymerase enzymes catalyze the addition of each 
ribonucleotide to the 3’ end of the nascent strand 
and form phosphodiester bonds between a triphos- 
phate group at the 5’ carbon of one nucleotide and the 
hydroxyl group at the 3’ carbon of the adjacent nucleo- 
tide, eliminating two phosphates (the pyrophosphate 
group), just as in DNA synthesis. Compare Figure 8.2 
to Figure 7.6 to see the similarity of these nucleic acid 
synthesis processes. 


Identification of Messenger RNA 


In their search for the RNA molecule responsible for 
transmitting the genetic information content of DNA to 
the ribosome for protein production, researchers utilized 
many techniques. Among the methods used was the 
pulse-chase technique (see Section 7.3) to follow the trail 
of newly synthesized RNA in cells. The “pulse” step of this 
technique exposes cells to radioactive nucleotides that 
become incorporated into newly synthesized nucleic acids 
(see Chapter 7). After a short incubation period to incor- 
porate the labeled nucleotides, a “chase” step replaces 
any remaining unincorporated radioactive nucleotides by 
introducing an excess of unlabeled nucleotides. An ex- 
perimenter can then observe the location and movement 
of the labeled nucleic acid to determine the pattern of its 
movement and its ultimate destination and fate. 

In 1957, microbiologist Elliot Volkin and geneticist 
Lazarus Astrachan used the pulse-chase method to exam- 
ine transcription in bacteria immediately following infec- 
tion by a bacteriophage. Exposing newly infected bacteria 
to radioactive uracil, they observed rapid incorporation 
of the label, indicating a burst of transcriptional activity. 
In the chase phase of the experiment, when radioactive 
uracil was removed, Volkin and Astrachan found that the 
radioactivity quickly dissipated, indicating that the newly 
synthesized RNA broke down rapidly. They concluded 
that the synthesis of a type of RNA with a very short life 
span is responsible for the production of phage proteins 
that drive progression of the infection. 

Similar pulse-chase experiments were soon con- 
ducted with eukaryotic cells. In these experiments, cells 
were pulsed with radioactive uracil that was then chased 
with nonradioactive uracil. Immediately after the pulse, 
radioactivity was concentrated in the nucleus, indicat- 
ing that newly synthesized RNA has a nuclear location. 
Over a short period, radioactivity migrated to the cyto- 
plasm, where translation takes place. The radioactivity 
dissipated after lingering in the cytoplasm for a period of 
time. These experiments led researchers to conclude that 
the RNA synthesized in the nucleus was likely to act as an 
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Figure 8.2 RNA synthesis. 


intermediary carrying the genetic message of DNA to the 
cytoplasm for translation into proteins. 

The discovery of mRNA was capped in 1961 when 
an experiment by the biologists Sydney Brenner, Francois 
Jacob, and Matthew Meselson identified an unstable form 
of RNA as the genetic messenger. Brenner and his col- 
leagues designed an experiment using the bacteriophage 
T2 and Escherichia coli to investigate whether phage pro- 
tein synthesis requires newly constructed ribosomes, or 
whether phage proteins could be produced using existing 
bacterial ribosomes and a messenger molecule to encode 
the proteins. The experiment found that newly synthe- 
sized phage RNA associates with bacterial ribosomes to 
produce phage proteins. The RNA that directed the pro- 
tein synthesis formed and degraded quickly, leading the 
experimenters to conclude that a phage “messenger” RNA 
with a short half-life is responsible for protein synthesis 
during infection. 
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A large variety of different RNA species exist within any 
cell. The most essential types of RNA are found in all cells 
in all three domains, but several others are specific to eu- 
karyotic cells. Table 8.1 identifies and briefly describes the 
most important types of RNA found in cells, although it 
is not an exhaustive list, as there are too many varieties of 
RNA to describe all of them here. 

All RNAs are transcribed from RNA-encoding genes. 
The various types of RNA are constructed from the same 
building blocks but perform different roles in the cell. In 
light of these different roles, RNAs are divided into two 
general categories—messenger RNA and functional RNA. 

Genes transcribing messenger RNA (mRNA) are 
protein- producing genes, and their transcripts direct pro- 
tein synthesis by the process of translation. Messenger 
RNA is the short-lived intermediary form of RNA that 
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Table 8.1 | Major RNA Molecules 


Type of RNA 


Messenger RNA 
(mRNA) 


Function 


(see Section 8.2). 


Used to encode the sequence of amino acids in a polypeptide. May be polycistronic (encoding two 
or more polypeptides) in bacteria and archaea. Encodes single polypeptides in nearly all eukaryotes 


Ribosomal RNA (rRNA) 


Transfer RNA (tRNA) 


Along with numerous proteins, helps form the large and small ribosomal subunits that unite 
for translation of mRNA (see Sections 8.4 and 9.2). 


Carries amino acids to ribosomes and binds there to mRNA by complementary base pairing in order 


to deposit the amino acids to elongate the polypeptide (see Sections 8.4 and 9.3). 


Small nuclear RNA 
(snRNA) 


MicroRNA (miRNA) 
and small interfering 
RNA (siRNA) 


Telomerase RNA 


expression (see Section 15.3). 


Found in eukaryotic nuclei, where multiple snRNAs join with numerous proteins to form spliceosomes 
that remove introns from precursor mRNA (see Section 8.4). 


Eukaryotic regulatory RNAs that have different origins. Involved in eukaryotic regulation of gene 


Along with several proteins, forms telomerase, the ribonucleoprotein complex essential for maintaining 


and elongating telomere length of eukaryotic chromosomes (see Section 7.4). 


conveys the genetic message of DNA to ribosomes for 
translation. Messenger RNA is the only form of RNA that 
undergoes translation. Transcription of mRNA and post- 
transcriptional processing of mRNA are principal areas of 
focus in this chapter. 

Functional RNAs perform a variety of specialized 
roles in the cell. The functional RNAs carry out their 
activities in nucleic acid form and are not translated. Two 
major categories of functional RNA are active in bacte- 
rial and eukaryotic translation. Transfer RNA (tRNA) 
is encoded in dozens of different forms in all genomes. 
Each tRNA is responsible for binding a particular amino 
acid that it carries to the ribosome. There the tRNA 
interacts with mRNA and deposits its amino acid for 
inclusion in the growing protein chain. Ribosomal RNA 
(rRNA) combines with numerous proteins to form the 
ribosome, the molecular machine responsible for trans- 
lation. Certain bacterial rRNA molecules interact with 
mRNA to initiate translation. 

Three additional types of functional RNA perform 
specialized functions in eukaryotic cells only. Small 
nuclear RNA (snRNA) of various types is found in the 
nucleus of eukaryotic cells, where it participates in mRNA 
processing. Certain snRNAs unite with nuclear proteins 
to form ribonucleoprotein complexes that are responsible 
for intron removal. We discuss these activities in later 
sections of this chapter. Micro RNA (miRNA) and small 
interfering RNA (siRNA) are recently recognized types 
of regulatory RNA that are particularly active in plant and 
animal cells. Micro RNAs and siRNAs have a widespread 
and important role in the post-transcriptional regula- 
tion of mRNA, regulating protein production through a 
process called RNA interference. Their transcription and 
activities are beyond the scope of this chapter, but they 
are central to the discussion of the regulation of gene 
expression in eukaryotes in Chapter 15. 


Lastly, certain RNAs in eukaryotic cells have cat- 
alytic activity. In contrast to DNA, which is exclusively 
a repository of genetic information, catalytically active 
RNA molecules can catalyze biological reactions. Called 
ribozymes, catalytically active RNAs can activate cellular 
reactions, including the removal of introns in a process 
identified as self-splicing, described later in the chapter. 


8.2 Bacterial Transcription 
Is a Four-Stage Process 


Transcription is the synthesis of a single-stranded RNA 
molecule by RNA polymerase. It is most clearly under- 
stood and described in bacteria, and E. coli is the model 
experimental organism from which the majority of our 
knowledge of bacterial transcription has been derived. In 
this section, we examine the four stages of transcription 
in bacteria: (1) promoter recognition and identification, 
(2) the initiation of transcript synthesis, (3) transcript 
elongation, and (4) transcription termination. 

Like all RNA polymerases, bacterial RNA polymerase 
uses one strand of DNA, the template strand, to assem- 
ble the transcript by complementary and antiparallel base 
pairing of RNA nucleotides with DNA nucleotides of the 
template strand (see Figure 1.9 for a review). The coding 
strand of DNA, also known as the nontemplate strand, 
is complementary to the template strand. The gene—that 
is, the stretch of DNA regions that produces an RNA 
transcript—contains several segments with distinct func- 
tions (Figure 8.3). The promoter of the gene is immedi- 
ately upstream—that is, immediately 5’ to the start of 
transcription, which is identified as corresponding to the 
+1 nucleotide. The promoter is not transcribed. Instead, 
the promoter sequence is a transcription-regulating DNA 
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Figure 8.3 A general diagram 
of gene structure and associated 
nomenclature. 
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sequence that controls the access of RNA polymerase to 
the gene. The coding region is the portion of the gene 
that is transcribed into mRNA and contains the informa- 
tion needed to synthesize the protein product of the gene. 
The termination region is the portion of the gene that 
regulates the cessation of transcription. The termination 
region is located immediately downstream—that is, im- 
mediately 3’ to the coding segment of the gene. 


Bacterial RNA Polymerase 


A single type of E. coli RNA polymerase catalyzes tran- 
scription of all RNAs. The initial experimental evidence 
supporting this conclusion came from analysis of the 
effect of the antibiotic rifampicin on bacterial RNA syn- 
thesis. Rifampicin inhibits RNA synthesis by preventing 
RNA polymerase from catalyzing the formation of the first 
phosphodiester bond in the RNA chain. In rifampicin- 
sensitive (rif®) bacterial strains, synthesis of all three major 
types of RNA (mRNA, tRNA, and rRNA) is inhibited 
in the presence of rifampicin. In contrast, rifampicin- 
resistant (rif®) bacteria actively transcribe DNA into the 
three major RNAs when rifampicin is present. Molecular 
analysis identifies a single mutation of RNA polymerase 
in rif® strains that allows it to remain catalytically active 
when exposed to rifampicin, and subsequent molecular 
studies have confirmed the presence of a single bacterial 
RNA polymerase. 

Bacterial RNA polymerase is composed of a pen- 
tameric (five-polypeptide) RNA polymerase core that 
binds to a sixth polypeptide, called the sigma subunit 
(o), which induces a conformational change in the core 
enzyme that switches it to its active form. In its active 
form, the RNA polymerase is described as a holoenzyme, 
a term meaning an intact complex of multiple subunits, 
with full enzymatic capacity. Figure 8.4 shows a common 
type of sigma subunit known as o”, but there are also 
other sigma subunits in E. coli. 

The RNA polymerase core consists of two a subunits, 
designated al and all, two P subunits, and an œ (omega) 
subunit. The molecular weight of the five-subunit core 
RNA polymerase is approximately 390 kD (kiloDaltons), 
and with the sigma subunit added, the holoenzyme has a 
molecular weight of 430 kD. Each of these subunits have 
been evolutionarily conserved in archaea and in eukary- 
otes, as we discuss in the following section. 


By itself, the core RNA polymerase can transcribe DNA 
template-strand sequence into RNA sequence, but the core 
is unable to efficiently bind to a promoter or initiate RNA 
synthesis without a sigma subunit. The joining of the sigma 
subunit to the core enzyme to form a holoenzyme induces 
a conformational shift in the core segment that enables 
it to bind specifically to particular promoter consensus 
sequences. The addition of the sigma subunit to the core 
RNA polymerase, with its five subunits and approximately 
390-kD molecular weight, produces a holoenzyme having 
a molecular weight of approximately 430 kD. Each of the 
subunits has been evolutionarily conserved in archaea and 
in eukaryotes, as we discuss in a following section. 

This single RNA polymerase is responsible for all bac- 
terial transcription. Thus, the bacterial RNA polymerase 
must recognize promoters for protein-coding genes as 
well as for genes that produce functional RNAs, such as 
tRNA and rRNA. However, not all promoters of bacte- 
rial genes are identical. There is great diversity among 
bacterial promoter sequences, permitting certain genes to 
be expressed only under special circumstances. Bacteria 
manage the recognition of the promoters of these spe- 
cialized genes by producing several different types of 
sigma subunits that can join the core polymerase. These 
so-called alternative sigma subunits alter the specificity 
of the holoenzymes for promoter regions by imparting 
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Figure 8.4 Bacterial RNA polymerase core plus a sigma (0) 
subunit forms the fully active holoenzyme. 


distinct conformational changes to the core. These dif- 
ferences enable transcription of specific genes under the 
appropriate conditions, or at the correct time. 


Bacterial Promoters 


A promoter is a double-stranded DNA sequence that is 
the binding site for RNA polymerase. Promoters are regu- 
latory DNA sequences that bind transcription proteins, 
and their presence usually indicates that a gene is nearby. 
Bacterial promoters are located a short distance upstream 
of the coding sequence, typically within a few nucleotides 
of the start of transcription, represented by the +1 nucleo- 
tide. RNA polymerase is attracted to promoters by the 
presence of consensus sequences, short regions of DNA 
sequences that are highly similar, though not necessarily 
identical, to one another and are located in the same posi- 
tion relative to the start of transcription of different genes. 

Although promoters are double stranded, promoter 
consensus sequences are usually written in a single- 
stranded shorthand form that gives the 5’-to-3' sequence 
of the coding (non-template) strand of DNA (Figure 8.5). 
The most commonly occurring bacterial promoter con- 
tains two consensus sequence regions that each play an 
important functional role in recognition by RNA poly- 
merase and the subsequent initiation of transcription. 
These consensus sequences are located upstream from 
the +1 nucleotide (the start of transcription) in a region 
flanking the gene where the nucleotides are denoted by 
negative numbers and are not transcribed. At the —10 
position of the E. coli promoter is the Pribnow box 
sequence, or the —10 consensus sequence, consisting of 
6 bp having the consensus sequence 5'-TATAAT- 3’. The 
Pribnow box is separated by about 25 bp from another 
6-bp region, the —35 consensus sequence, identified by 
the nucleotides 5'- TTGACA- 3’. The nucleotide sequences 
that occur upstream, downstream, and between these 
consensus sequences are highly variable and contain no 
other consensus sequences. Thus, in a functional sense, 
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the —10 (Pribnow) and —35 consensus sequences are im- 
portant because of their nucleotide content, their location 
relative to one another, and their location relative to the 
start of transcription. In contrast to the consensus se- 
quences themselves, the nucleotides between —10 and —35 
are important as spacers between the consensus elements, 
but their specific sequences are not critical. 

Natural selection has operated to retain strong sequence 
similarity in consensus regions and to retain the position of 
the consensus regions relative to the start of transcrip- 
tion. The effectiveness of evolution in maintaining promoter 
consensus sequences is illustrated by comparison with the 
sequences between and around —10 and —35, which are not 
conserved and which exhibit considerable variation. In addi- 
tion, the spacing between the sequences and their placement 
relative to the +1 nucleotide is stable. RNA polymerase is 
a large molecule that binds to —10 and —35 consensus se- 
quences and occupies the space between and immediately 
around the sites. Crystal structure models show that the 
enzyme spans enough DNA to allow it to contact promoter 
consensus regions and reach the +1 nucleotide. Once bound 
at a promoter in this fashion, RNA polymerase can initiate 
transcription. Genetic Analysis 8.1 guides you through the 
identification of promoter consensus regions. 


Transcription Initiation 


RNA polymerase holoenzyme initiates transcription 
through a process involving two steps. In the first step, 
the holoenzyme makes an initial loose attachment to 
the double-stranded promoter sequence and then binds 
tightly to it to form the closed promoter complex (© 
in Foundation Figure 8.6). In the second step, the 
bound holoenzyme unwinds approximately 18 bp of 
DNA around the —10 consensus sequence to form the 
open promoter complex (@). Following formation of 
the open promoter complex, the holoenzyme progresses 
downstream to initiate RNA synthesis at the +1 nucleo- 
tide on the template strand of DNA (©). 


Gene 


-10 
Consensus 
-35 sequence 
Consensus (Pribnow 
DNA sequence box) 


+1 . P 
RNA-coding region——————— 


Coding strand 5’ i TTGACA TATAAT 
Template strand 3’ | | i 


Í, E 
ji am > 
nore ! 
i Transcription ' Termination 
| start Transcription | "region 
l Start | Stop l 
l codon codon l 


Figure 8.5 Bacterial promoter structure. Two promoter consensus sequences—the Pribnow box 
at—10 and the —35 sequence—are essential promoter regulatory elements. 
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GENETIC ANALYSIS 


PROBLEM DNA sequences in the promoter region of 10 E. coli genes are shown. Sequences at the 
—35 and —10 sites are boxed. BREAK IT DOWN: Promoter 


š , $ = 23 consensus sequences are similar in dif- 

a. Use the sequence information provided to deduce the —35 and —10 consensus sequences. ferent genes and bind transcriptionally 
b. Speculate on the relative effects on transcription of a mutation in a promoter consensus active proteins (p. 273). 

region versus a mutation in the sequence between consensus regions. ~ eee IT DOWN: Research methods directed at detecting promoters and | 


assessing their functionality are described in Research Technique 8.1 and Figure 8.11 


-35 -10 +1 
Gene region region 


A2 AATGCTTGACTICTGTAGCGGGAAGGCG--|TATAATIGCACACC-|CIC GC 
bio AAAACGTGTTIITTTGTTGTTAATTCGGTGITAGACTII GT ---AAAICCT 
his AGTTCTTGCTTITCTAACGTGAAAGTGGTTITAGGTTAAAAGAC-IAITCA 
A 
G 


lac CAGGCTTTACACTTTATGCTICCGGCTCGITATGTT]GTG-TGG-|AATT 
GAA 


laci GAATGGCGCAAAACTTTTCGCGGTATGG-|CATGATAGCGCCC-| 
leu AAAAQTTGACAITCCGTTTTTGTATCCAG-ITAACTCTAAAAGC-JAITAT 
recA AACACTTGATAICTGTATGAGCATACAG--[TATAATĪTGCTTC- -AJACA 
A 
G 


tp AGCTGTTGACAATTAATCATCGAACTAG-|TTAACTAGTACGC-|AJAGT 
tRNA AACACTTTACAIGCGGGCCGTCATTTGA--|TATGATIGCGCCCC-|GIC TT 
X1 TCCGCTTGTCTITCCTAGGCCGACTCCC--|TATAATIGCGCCTCCIAITCG 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic this problem addresses 1. This question concerns bacterial promoters. The answer requires identifica- 
and the nature of the required answer. tion of consensus sequences for —35 and —10 regions of promoters and 
speculation about the consequences of promoter mutations. 
2. Identify the critical information provided 2. The problem provides promoter sequence information for 10 E. coli genes 
in the problem. and identifies the segment of each promoter containing the —10 and 
—35 regions. 
Deduce 
3. Examine the —10 and —35 sequences of 3. The —10 and —35 sites are the location of RNA polymerase binding during 
these promoters, and look for common transcription initiation. Count the numbers of A, T, C, and Gin each position 
patterns. in the boxed regions. 
TIP: A consensus sequence identifies 
the most common nucleotide at each 
position in a DNA segment. 
Solve Answer a 
4. Determine the consensus sequence at 4. Atthe—10 site, and moving left to right (toward +1), the most common 
the —10 and —35 regions. nucleotides in each position in the consensus region, and the number of 


TIP: Identify the most commonly occurring smes they occur in that position, are 


nucleotide in each position of the 6-nucleotide 
consensus region of these genes. 


T A TAA T 
(9) (9) (6) (5) (5) (9) 


At the —35 site, also moving left to right (toward the +1), the most common 
nucleotides in each position, and the number of times they occur in that 
position, are 


T T GACA 
(8) (9) (8) (6) (6) (6) 


Answer b 

5. Mutation in a consensus sequence is likely to alter the efficiency with which 
a protein binds to the promoter and to decrease the amount of gene tran- 
scription. In contrast, mutations between consensus sequences are unlikely 
to alter gene transcription because the sequences in these intervening 
regions do not bind tightly to RNA polymerase. 


5. Compare and contrast the likely effects 
of consensus sequence mutations with 
those of mutations occurring between 
consensus regions. 


For more practice, see Problems 4, 7, and 18. Visit the Study Area to access study tools. MasteringGenetics™ 


275 


276 CHAPTER 8 Molecular Biology of Transcription and RNA Processing 


Table 8.2 


Escherichia coli RNA Polymerase Sigma Subunits 


Subunit Molecular Weight (Daltons) Consensus Sequence Function 
—35 —10 
oe 28 TAAA GCCGATAA Flagellar synthesis and chemotaxis 
- g 32 CTTGAA CCCCATTA Heat shock genes _ 
954 54 CTGGPyAPyPu TTGCA Nitrogen metabolism 
a 70 TTIGACA TATAAT Housekeeping genes 


Bacterial promoters often differ from the consensus se- 
quence by one or more nucleotides, and some are different 
at several nucleotides. Since considerable DNA-sequence 
variation occurs among promoters, it is reasonable to ask 
how RNA polymerase is able to recognize promoters and 
reliably initiate RNA synthesis. For an answer, we turn to 
the sigma subunits that confer promoter recognition and 
chain-initiation ability on RNA polymerase. 

Four alternative sigma subunits identified in E. coli 
are named according to their molecular weight (Table 8.2). 
Each alternative sigma subunit leads to recognition of a 
different set of —10 and —35 consensus sequences by the 
holoenzyme. These different consensus sequence elements 
are found in promoters of different types of genes; thus, 
the sigma subunit that it becomes attached to determines 
the specific gene promoters a holoenzyme will recognize. 

The sigma subunit o”° is the most common in bacte- 
ria. It recognizes promoters of “housekeeping genes,” the 
genes whose protein products are continuously needed 
by cells. Because of the constant need for their prod- 
ucts, housekeeping genes are continuously expressed. 
Subunits o% and o°” recognize promoters of genes 
involved in nitrogen metabolism and genes expressed in 
response to environmental stress such as heat shock and 
are utilized when the action of these genes is required. 
The fourth sigma subunit, 07°, recognizes promoters for 
genes required for bacterial chemotaxis (chemical sens- 
ing and motility). 

The specificity of each type of sigma subunit for 
different promoter consensus sequences produces RNA 
polymerase holoenzymes that have different DNA- 
binding specificities. Microbial geneticists estimate that 
each E. coli cell contains about 3000 RNA polymerase 
holoenzymes at any given time and that each of the four 
kinds of sigma subunits is represented to a differing de- 
gree among them. Because sigma subunits readily attach 
and detach from core enzymes in response to changes in 
environmental conditions, the organism is able to change 
its transcription patterns to adjust to different conditions. 


Transcription Elongation and Termination 


Upon reaching the +1 nucleotide, the holoenzyme begins 
RNA synthesis by using the template strand to direct 
RNA assembly. The holoenzyme remains intact until 


the first 8 to 10 RNA nucleotides have been joined. At 
that point, the sigma subunit dissociates from the core 
enzyme, which continues its downstream progression 
(© in Foundation Figure 8.6). The sigma subunit itself 
remains intact and can associate with another core en- 
zyme to transcribe another gene. 

Downstream progression of the RNA polymerase 
core is accompanied by DNA unwinding ahead of the 
enzyme to maintain approximately 18 bp of unwound 
DNA (©). As the RNA polymerase passes, progressing 
at a rate of approximately 40 nucleotides per second, the 
DNA double helix reforms in its wake. When transcrip- 
tion of the gene is completed, the 5’ end of the RNA 
trails off the core enzyme (@). 

The end product of transcription is a single-stranded 
RNA that is complementary and antiparallel to the tem- 
plate DNA strand. The transcript has the same 5’-to-3' 
polarity as the coding strand of DNA, the strand comple- 
mentary to the template strand. The coding strand and 
the newly formed transcript also have identical nucleotide 
sequences, except for the presence of uracil in the tran- 
script in place of thymine in the coding strand. For this 
reason, gene sequences are written in 5’-to-3’ orientation 
as single-stranded sequences based on the coding strand 
of DNA. This allows easy identification of the mRNA se- 
quence of a gene by simply substituting U for T. 

Gene transcription is not a one-time event, and shortly 
after one round of transcription is initiated, a second round 
begins with new RNA polymerase—promoter interaction. 
Following sigma subunit dissociation and core enzyme syn- 
thesis of 50 to 60 RNA nucleotides, a new holoenzyme can 
bind to the promoter and initiate a new round of transcrip- 
tion while the first core enzyme continues along the gene. 
In addition, if the transcript under construction is mRNA, 
the 5’ end is immediately available to begin translation. 
In contrast, transcripts that are functional RNAs, such as 
transfer and ribosomal RNA, must await the completion of 
transcription before undergoing the folding into secondary 
structures that readies them for cellular action. 


Transcription Termination Mechanisms 


Termination of transcription in bacterial cells is signaled 
by a DNA termination sequence that usually contains a re- 
peating sequence producing distinctive 3’ RNA sequences. 


Termination sequences are downstream of the stop codon; 
thus, they are transcribed after the coding region of the 
mRNA and so are not translated. Two transcription termi- 
nation mechanisms occur in bacteria. The most common 
is intrinsic termination, a mechanism dependent only 
on the occurrence of specialized repeat sequences in DNA 
that induce the formation in RNA of a secondary struc- 
ture leading to transcription termination. Less frequently, 
bacterial gene transcription terminates by rho-dependent 
termination, a mechanism characterized by a different 
terminator sequence and requiring the action of a special- 
ized protein called the rho protein. 


Intrinsic Termination Most bacterial transcription 
termination occurs exclusively as a consequence of termi- 
nation sequences encoded in DNA—that is, by intrinsic 
termination. Intrinsic termination sequences have two 
features. First, they are encoded by a DNA sequence 
containing an inverted repeat, a DNA sequence repeated 
in opposite directions but with the same 5’-to-3’ polarity. 
Figure 8.7 shows the inverted repeats (“repeat 1” and 
“repeat 2”) in a termination sequence, separated by a 
short spacer sequence that is not part of either repeat. 
The second feature of intrinsic termination sequences is 
a string of adenines on the template DNA strand that 
begins at the 5’ end of the repeat 2 region. Transcription 
of inverted repeats produces mRNA with complementary 
segments that are able to fold into a short double-stranded 
stem ending with a single-stranded loop. This secondary 
structure is a stem-loop structure, also known as a 
hairpin. A string of uracils complementary to the adenines 
on the template strand immediately follows the stem-loop 
structure at the 3’ end of the RNA. 

The formation of a stem-loop structure followed 
immediately by a poly-U sequence near the 3’ end of 
RNA causes the RNA polymerase to slow down and 
destabilize. In addition, the 3’ U-A region of the RNA- 
DNA duplex contains the least stable of the comple- 
mentary base pairs. Together, the instability created by 
RNA polymerase slowing and the U-A base pairs induces 
RNA polymerase to release the transcript and separate 
from the DNA. The behavior of RNA polymerase dur- 
ing intrinsic termination of transcription is like that of 
a bicycle rider at slow speed. Slow forward momentum 
creates instability and eventually the rider loses balance. 
In a similar way, RNA polymerase is destabilized as it 
slows while transcribing inverted repeat sequences, and 
it falls off DNA when the transcript is released where A-U 
base pairs form. 


Rho-Dependent Termination In contrast to the more 
common intrinsic termination, certain bacterial genes 
require the action of rho protein to bind to nascent mRNA 
and catalyze separation of mRNA from RNA polymerase 
to terminate transcription. Genes whose transcription 
is rho-dependent have termination sequences that are 
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Termination sequence 
Inverted Inverted 
repeat 1 repeat 2 


5']/TTA ACTAAATA 
ai GGC y 
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HAA A A A ATES 
Polyadenine 
sequence 


(1) Intrinsic termination sequences contain inverted 
repeats separated by a spacer sequence and followed 
by a polyadenine sequence. | 


(3) Inverted repeat sequences 
in the transcript fold into a 
complementary stem 
separated by a single- 
stranded loop. 
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(4) Hydrogen bonds between RNA 
A-U base pairs break, transcript 
releasing the transcript and 
terminating transcription. 


Figure 8.7 Intrinsic termination of transcription is driven 
by the presence of inverted repeat DNA sequences. 


distinct from those in genes utilizing intrinsic termination. 
Stem-loop structures often form as part of rho-dependent 
termination, but rho-dependent terminator sequences do 
not have a string of uracil residues. Instead, the sequences 
contain a rho utilization site, or rut site, which is a 
stretch of approximately 50 nucleotides that is rich in 
cytosine and poor in guanine. 

Rho protein is composed of six identical polypep- 
tides and has two functional domains, both of which are 
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utilized during the two-step process of transcription ter- 
mination. The first step is initiated when rho protein is 
activated by an ATP molecule that binds to one functional 
domain of rho. Activated rho protein utilizes its second 
domain to bind to the rut site of the RNA transcript. 
Using ATP-derived energy, rho then moves along the 
mRNA in the 3’ direction, eventually catching up to RNA 
polymerase that has slowed near a terminator sequence. 
As the rho travels, it catalyzes the breakage of hydrogen 
bonds between mRNA and the DNA template strand. The 
bond breakage releases the transcript from the RNA poly- 
merase and induces the polymerase to release the DNA. 


8.3 Archaeal and Eukaryotic 
Transcription Displays Structural 
Homology and Common Ancestry 


Bacteria use a single RNA polymerase core enzyme and 
several alternative sigma subunits to transcribe all genes. 
Similarly, archaea have a single type of RNA polymerase. 
Eukaryotes, by contrast, each have multiple RNA poly- 
merases that are specialized for the transcription of differ- 
ent genes. The archaeal and eukaryotic RNA polymerases 
responsible for the transcription of most polypeptide- 
producing genes share a common structure that is diver- 
gent from the bacterial RNA polymerase. Transcription in 
archaea and eukaryotes progresses through the same four 
stages we described for bacteria: promoter recognition, 
transcription initiation, transcript elongation, and tran- 
scription termination. Several structural and functional 
factors make transcription more complex in archaea and 
eukaryotes. First, eukaryotic promoters and consensus se- 
quences are considerably more diverse than in E. coli, and 
eukaryotes have three different RNA polymerases that 
recognize different promoters, transcribe different genes, 
and produce different RNAs. Promoter consensus se- 
quences in archaea are considerably less complicated than 
those in eukaryotes, but they appear to be more diverse 
than bacterial promoter sequences. Second, the molecular 
apparatus assembled at promoters to initiate and elongate 
transcription is more complex in eukaryotes and in ar- 
chaea. Third, eukaryotic genes contain introns and exons, 
requiring extensive post-transcriptional processing of 
mRNA. Archaeal genes generally do not contain introns, 
although there is RNA splicing of archaeal pre-tRNAs in 
a similar manner to eukaryotic pre-tRNA splicing. We 
describe these details in a later section. Finally, eukaryotic 
DNA is permanently associated with a large amount of 
protein to form a compound known as chromatin. 
Chromatin plays a central role in regulating eukary- 
otic transcription. Chromatin structure is a permanent 
feature and a dynamic feature of eukaryotic genomes. 
Its state controls the accessibility of DNA to transcrip- 
tion, either permitting or blocking RNA polymerase and 


transcription factor access to promoters. In later chapters, 
we discuss chromatin structure (Chapter 11) and explore 
the functional role of chromatin in the regulation of gene 
expression in eukaryotes (Chapter 15). 


Eukaryotic and Archaeal RNA Polymerases 


Three different RNA polymerases transcribe distinct 
classes of RNA coded by eukaryotic genomes: RNA poly- 
merase I (RNA pol I) transcribes three ribosomal RNA 
genes, RNA polymerase II (RNA pol II) is responsible for 
transcribing messenger RNAs that encode polypeptides as 
well as for transcribing most small nuclear RNA genes, 
and RNA polymerase III (RNA pol IID) transcribes all 
transfer RNA genes as well as one small nuclear RNA 
gene and one ribosomal RNA gene. RNA pol II and RNA 
pol III are responsible for miRNA and siRNA synthesis. 

The RNA polymerases of members of all three do- 
mains of life share similarities of sequence and function. 
The E. coli RNA polymerase core enzyme has five units. 
Each of these subunits has a homolog in the 10 to 13 sub- 
unit (depending on the species) archaeal RNA polymerase 
and in the 10 to 12 subunit (depending on the species) 
eukaryotic RNA polymerase II (Table 8.3). 

Despite differences in sizes and molecular complex- 
ity, the RNA polymerases have a similar overall structure, 
forming a characteristic shape one reminiscent of DNA 
polymerase (see Figure 7.23), with a “hand” composed of 
protein “fingers” to help RNA polymerase grasp DNA, 
and a “palm” in which polymerization takes place. These 
similarities of RNA polymerase structure and function are 
a direct result of the shared evolutionary history of bacte- 
ria, archaea, and eukaryotes. 


Table 8.3 RNA Polymerase Composition 


Bacteria Archaea Eukarya 
Saccaromyces 

Sulfolobus cerevisiae 

Escherichia coli solfataricus (RNA pol Il) 

5 subunits 10 subunits 12 subunits 

Homologous proteins: 

B RpoA’/A” Rpb1 

B RpoB Rpb2 

al RpoD Rpb3 

w RpoK Rpb6 

all RpoL Rpb11 

Additional proteins: 
RpoE, RpoF, Rpb4, Rpb5, 
RpoH, Rpb7, Rpb8, 
RpoN and Rpb9, Rpb10, 
RpoP Rpb12 
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Consensus Sequences for Eukaryotic RNA 
Polymerase II Transcription 


RNA polymerase II transcribes eukaryotic polypeptide- 
coding genes into mRNA. The promoters for these genes 
are numerous and highly diverse, with different overall 
lengths and differences in the number and type of consen- 
sus sequences prominent among the sources of promoter 
variation. Given these characteristics, it is reasonable to 
ask how RNA polymerases locate promoter DNA for dif- 
ferent genes. 


Research Technique 8.1 


Three lines of investigation help researchers to identify 
and characterize promoters of different polypeptide-coding 
genes: (1) promoters are identified by determining which 
DNA sequences are bound by proteins associated with 
RNA pol II during transcription, (2) putative promoter se- 
quences from different genes are compared to evaluate their 
similarities, and (3) mutations that alter gene transcription 
are examined to identify how DNA base-pair changes affect 
transcription. Research Technique 8.1 discusses the experi- 
mental identification and analysis of promoters. 


Band Shift Assay to Identify Promoters 


PURPOSE The functional action of promoters in transcrip- 
tion depends on consensus DNA sequences that bind RNA 
polymerase and transcription factor proteins. To locate 
promoters, molecular biologists first scan DNA for potential 
promoter consensus sequences and then determine that the 
sequence binds transcriptionally active proteins. 


MATERIALS AND PROCEDURES Fragments of DNA con- 
taining suspected promoter consensus sequence are exam- 
ined by two experimental methods. The first, called band shift 
assay, verifies that the sequence of interest binds proteins. 
The second, called DNA footprint protection assay, identifies 
the exact location of the protein-binding sequence. 

In band shift assay, two identical samples of DNA frag- 
ments that contain suspected consensus sequence are 
analyzed. One DNA sample is a control to which no tran- 
scriptional proteins are added. The experimental DNA sam- 
ple, on the other hand, has transcriptional proteins added. 
Both the control and the experimental DNA samples are 
subjected to electrophoresis. 

DNA footprint protection also begins with two identi- 
cal samples of DNA fragments containing suspected con- 
sensus sequences. All fragments are end-labeled with ??P. 
The experimental DNA is mixed with transcriptional proteins, 
but the control sample is not. Both samples are exposed to 
DNase | that randomly cuts DNA that is not protected by pro- 
tein. The samples are run in separate lanes of an electrophore- 
sis gel, and each end-labeled fragment produced is identified 
by autoradiography. 


DESCRIPTION In the band shift assay result, notice that the 
electrophoretic mobility of experimental DNA is slower than 
that of control DNA. This is the anticipated result if the experi- 
mental sample contains consensus sequence that is bound 
by transcriptional proteins. The bound protein increases the 
molecular weight of the experimental sample and slows its 
migration relative to the same DNA without bound protein. 
In the DNA footprint protection assay, notice that the experi- 
mental DNA lane contains a gap in which no DNA fragments 
appear. The gap represents “footprint protection” for the por- 
tion of the fragment that is protected from DNase | digestion 
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by binding of transcriptional proteins 
to promoter sequences on DNA. 


by bound transcriptional proteins. No such protection occurs 
for the control fragment that is randomly cleaved. 


CONCLUSION Evidence from these two methods consti- 
tutes necessary but not sufficient evidence that the DNA frag- 
ment contains a promoter. The final piece of evidence that a 
DNA fragment contains a promoter rests on mutational analy- 
sis that identifies functional changes caused by mutations of 
specific nucleotides of promoter consensus sequences (see 
Figure 8.11). 
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The most common eukaryotic promoter consensus 
sequence, the TATA box, is shown in Figure 8.8 as part 
of a set of three consensus segments that were the first 
eukaryotic promoter elements to be identified. A TATA 
box, also known as a Goldberg-Hogness box, is located 
approximately at position —25 relative to the beginning of 
the transcriptional start site. Consisting of 6 bp with the 


consensus sequence TATAAA, it is the most strongly con- 
served promoter element in eukaryotes. The figure shows 
two additional consensus sequence elements that are more 
variable in their frequency in promoters. A 4-bp consensus 
sequence identified as the CAAT box is most commonly 
located near —80 when it is present in the promoter. 
An upstream GC-rich region called the GC-rich box, with 
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Figure 8.8 Three eukaryotic promoter consensus 


3’ sequence elements. The TATA box and the CAAT box 
5 are common; the presence of the upstream GC-rich 


GC-rich box CAAT box TATA box 
54 TATAAA 
-90 -80 -25 +1 


a consensus sequence GGGCGG located —90 or more up- 
stream of the transcription start, has a frequency that is 
less than that of CAAT box sequences. 

Comparison of eukaryotic promoters reveals a high 
degree of variability in the type, number, and location of 
consensus sequence elements (Figure 8.9). Some promot- 
ers contain all three of the consensus sequences identi- 
fied above, others contain one or two of these consensus 
elements, some contain none at all, and many contain 
other types of consensus sequence elements altogether. 
For example, the thymidine kinase gene contains TATA, 
CAAT, and GC-rich boxes along with an octamer (OCT) 
sequence, called an OCT box. The histone H2B gene con- 
tains two OCT boxes in addition to a TATA box and a pair 
of CAAT boxes. All of these consensus sequence elements 
play important roles in the binding of transcription factors, 
a group of transcriptional proteins described below. 


Promoter Recognition 


RNA polymerase II recognizes and binds to promoter 
consensus sequences in eukaryotes with the aid of 
proteins called transcription factors (TF). The TF pro- 
teins bind to promoter regulatory sequences and influ- 
ence transcription initiation by interacting, directly or 
indirectly, with RNA polymerase. Transcription factors 
that influence mRNA transcription, and therefore in- 
teract with RNA pol II, are given the designation TFII. 
Individual TFII proteins also carry a letter designation, 
such as A, B, or C. 

In most eukaryotic promoters, the TATA box is the 
principal binding site for transcription factors during 
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Figure 8.9 Examples of eukaryotic promoter variability. 


box is more variable. 


promoter recognition. At the TATA box, a protein called 
TFIID, a multisubunit protein containing TATA-binding 
protein (TBP) and subunits of a protein called TBP- 
associated factor (TAF), binds the TATA box sequence. 
The assembled TFIID binds to the TATA box region to 
form the initial committed complex (Figure 8.10). Next, 
TFIIA, TFIIB, TFIIF, and RNA polymerase II join the 
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Figure 8.10 Six general transcription factor proteins bind 
the promoter region to set the stage for eukaryotic transcrip- 
tion by RNA polymerase II. 
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initial committed complex to form the minimal initiation 
complex, which in turn is joined by TFIIE and TFIIH to 
form the preinitiation complex (PIC). The complete 
initiation complex contains six proteins that are com- 
monly identified as general transcription factors (GTFs). 
Once assembled, the complete initiation complex directs 
RNA polymerase II to the +1 nucleotide on the template 
strand, where it begins the assembly of messenger RNA. 
While most of the eukaryotic genes that have been 
examined have a TATA box and undergo TBP binding, 
there is evidence that some metazoan genes may use a re- 
lated factor called TLF (TBP-like factor). The complexity 
of TBP, TLF, and associated proteins is analogous to the 
different sigma factors in prokaryotic systems, thus allow- 
ing differential recognition of promoters in eukaryotes. 


Detecting Promoter Consensus Elements 


The diversity of eukaryotic promoters begs an impor- 
tant question: How do researchers verify that a segment 
of DNA is a functionally important component of a 
promoter? The research has two components; the first, 
outlined in Research Technique 8.1, is discovering the 
presence and location of DNA sequences that transcrip- 
tion factor proteins will bind to. The second component 
involves mutational analysis to confirm the functional- 
ity of the sequence. Researchers produce many different 
point mutations in the DNA sequence under study and 
then compare the level of transcription generated by each 
mutant promoter sequence with transcription generated 
by the wild-type sequence. 

Figure 8.11 shows a synopsis of promoter mutation 
analysis from an experiment performed by the molecular 
biologist Richard Myers and colleagues on a mamma- 
lian B-globin gene promoter. These researchers produced 
mutations of individual base pairs in TATA box, CAAT 
box, and GC-rich sequences, and of nucleotides between 
the consensus sequences, to identify the effect of each 


individual mutation on the relative transcription level 
of the gene. They found that most base-pair mutations 
in each of the three consensus regions significantly de- 
creased the transcription level of the gene and found two 
base substitutions in the CAAT box region that signifi- 
cantly increased transcription. In contrast, mutations out- 
side the consensus regions had nonsignificant effects on 
transcription level. Such results show the functional im- 
portance of specific DNA sequences in promoting tran- 
scription and confirm a functional role in transcription 
for TATA box, CAAT box, and GC-rich sequences. 


Enhancers and Silencers 


Promoters alone are often not sufficient to initiate tran- 
scription of eukaryotic genes, and other regulatory 
sequences are needed to drive transcription. This is par- 
ticularly the case for multicellular eukaryotes that have 
different numbers and patterns of expressed genes in 
different cells and tissues, and that change their patterns 
of gene expression as the organisms grow and develop. 
These tissue-specific or developmental types of transcrip- 
tional regulation are fully discussed in later chapters 
(Chapters 15 and 20), but here we highlight two cate- 
gories of DNA transcription-regulating sequences that 
lead to differential expression of genes. 

Enhancer sequences are one important group of 
DNA regulatory sequences that increase the level of tran- 
scription of specific genes. Enhancer sequences bind spe- 
cific proteins that interact with the proteins bound at 
gene promoters, and together promoters and enhancers 
drive transcription of certain genes. In many situations, 
enhancers are located upstream of the genes they regu- 
late; but enhancers can be located downstream as well. 
Some enhancers are relatively close to the genes they 
regulate, but others are thousands to tens of thousands of 
base pairs away from their target genes. Thus, important 
questions for molecular biologists are: What proteins are 
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Figure 8.12 Enhancers activate transcription in cooperation 
with promoters. A protein bridge composed of transcriptional 
proteins forms between enhancer and promoter sequences, 
which may be separated by thousands of nucleotides. 


bound to enhancers, and how do enhancer sequences 
regulate transcription of the gene given their different 
distances from the start of transcription? 

The answers are that enhancers bind activator pro- 
teins and associated coactivator proteins to form a protein 
“bridge” that bends the DNA and links the complete initia- 
tion complex at the promoter to the activator—coactivator 
complex at the enhancer (Figure 8.12). The bend produced 
in the DNA may contain dozens to thousands of base 
pairs. The action of enhancers and the proteins they bind 
dramatically increases the efficiency of RNA pol II in ini- 
tiating transcription, and as a result increases the level of 
transcription of genes regulated by enhancers. 

At the other end of the transcription-regulating spec- 
trum are silencer sequences, DNA elements that can act 
at a great distance to repress transcription of their target 
genes. Silencers bind transcription factors called repres- 
sor proteins, inducing bends in DNA that are similar to 
what is seen when activators and coactivators bind to 
enhancers—except with the consequence of reducing the 
transcription of targeted genes. Like enhancers, silenc- 
ers can be located upstream or downstream of a target 
gene and can reside up to several thousand base pairs 
away from it. Thus enhancers and silencers may operate 
by similar general mechanisms but with opposite effects 
on transcription. We discuss these and other eukaryotic 
regulatory DNA sequences in more detail in Chapter 15. 


RNA Polymerase I Promoters 


The genes for rRNA are transcribed by RNA polymerase I, 
utilizing a transcription initiation mechanism similar 
to that used by RNA pol II. RNA polymerase I is the 
most specialized eukaryotic RNA polymerase, as it tran- 
scribes a limited number of genes. It is recruited to up- 
stream promoter elements following the initial binding of 
transcription factors, and it transcribes ribosomal RNA 
genes found in the nucleolus (plural, nucleoli), a nuclear 


organelle containing rRNA and multiple tandem copies of 
the genes encoding rRNAs (tandem means “end to end”). 
In Arabidopsis, for example, each nucleolus contains 
about 700 copies of rRNA genes. Nucleoli play a key role 
in the manufacture of ribosomes. At nucleoli, transcribed 
ribosomal RNA genes are packaged with proteins to form 
the large and small ribosomal subunits. 

Promoters recognized by RNA pol I contain two simi- 
lar functional sequences near the start of transcription. The 
first is the core element, stretching from —45 to +20 and 
bridging the start of transcription, and the second is the 
upstream control element, spanning nucleotides —100 
to —150 (Figure 8.13). The core element is essential for 
transcription initiation, and the upstream control element 
increases the level of gene transcription. Both of these ele- 
ments are rich in guanine and cytosine; DNA sequence 
comparisons show that all upstream control elements have 
the same base pairs at approximately 85 percent of nucleo- 
tide positions, and the same is true of all core elements. 
Two upstream binding factor 1 (UBF1) proteins bind the 
upstream control element. A second protein complex, 
known as sigma-like factor 1 (SL1) protein, binds the core 


(1) The core element initiates transcription, and the upstream 
control element increases transcription efficiency. 
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Figure 8.13 Promoter consensus sequences for transcrip- 
tion initiation by RNA polymerase I. 
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element. This complex recruits RNA pol I to the core ele- 
ment, to initiate transcription of rRNA genes. 


RNA Polymerase III Promoters 


The remaining eukaryotic RNA polymerase, RNA poly- 
merase III, is primarily responsible for transcription of 
tRNA genes. However, it also transcribes one rRNA and 
other RNA-encoding genes. Each of these genes has a 
promoter structure that differs significantly from the 
structure of promoters recognized by RNA pol I or RNA 
pol II. Small nuclear RNA genes have three upstream 
elements, whereas the genes for 5S ribosomal RNA and 
transfer RNA each contain two internal promoter ele- 
ments that are downstream of the start of transcription. 

The upstream elements of small nuclear RNA genes 
are a TATA box, a promoter-specific element (PSE), 
and an octamer (OCT) (Figure 8.14a). A small number of 
transcription factors—TFIlls, in this case—bind to these 
elements and recruit RNA polymerase III, which initi- 
ates transcription in a manner similar to that of the other 
polymerases. 

The genes for 5S ribosomal RNA and transfer RNA 
have internal promoter elements called internal control 
regions (ICRs); see Figure 8.14b and c. The ICRs are two 


Downstream 


> 


(a) snRNA gene 


Upstream 


ži OCT PSE TATA 1 zi 


+1 | 
Transcription 


| 


EE 3 
snRNA 


snRNA genes have 
promoters upstream 
of transcription start. 


(b) 5S rRNA gene Internal control 


region 
Box A BoxC 
5! 3 
7 m ae E E i 5’ 
| | 
+1 
5S rRNA and tRNA | Transcription 
genes have internal f | 
promoters downstream 5! 3 
of transcription start. 5S rRNA 
(c) tRNA gene Internal control 
region 
i BoxA BoxB 
5’ 3 
7 = | -M E: 
+1 | 
Transcription 
54 3’ 


tRNA 


Figure 8.14 Promoter variation in genes transcribed by 
RNA polymerase III. 


short DNA sequences—designated box A and box B in 
some genes and box A and box C in other genes—located 
downstream of the start of transcription, between nucleo- 
tides +55 and +80 (Figure 8.15). To initiate transcription, 
box B or box C is bound by TFIIIA, which facilitates the 
subsequent binding of TFIIIC to box A. TFIIIB then binds 
to the other transcription factors. In the final initiation 
step, RNA polymerase III binds to the transcription fac- 
tor complex and overlaps the +1 nucleotide. With RNA 
polymerase correctly positioned, transcription begins ap- 
proximately 55 bp upstream of the beginning of box A, at 
the +1 nucleotide. 


Termination in RNA Polymerase | 
or Ill Transcription 


Each of the eukaryotic RNA polymerases utilizes a dif- 
ferent mechanism to terminate transcription. Here we 
briefly describe termination in transcription by RNA pol I 
and RNA pol III, leaving termination of RNA pol II tran- 
scription for more extensive discussion in Section 8.4. 
Transcription by RNA polymerase III is terminated in a 
manner reminiscent of E. coli transcription termination. 
The RNA pol II transcribes a terminator sequence that 
creates a string of uracils in the transcript. The poly-U 
string is similar to the string that occurs in bacterial in- 
trinsic termination (see Section 8.2). The RNA pol III 
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Figure 8.15 Promoter internal control regions for tran- 
scription by RNA polymerase III. 


terminator sequence does not contain an inverted repeat, 
however, so no stem-loop structure forms near the 3’ end 
of RNA. 

Transcription by RNA pol I is terminated at a 17-bp 
consensus sequence that binds transcription-terminating 
factor I (TTFI). The binding site for TTFI is the DNA con- 
sensus sequence 


AGGTCGACCAG4/“/,NTCG 


In this sequence, adenine and thymine are equally likely to 
appear at two adjacent sites, as indicated by the diagonal 
lines; N signifies a location at which all four nucleotides 
are more or less equally frequent. A large rRNA precursor 
transcript is cleaved about 18 nucleotides upstream of the 
TTFI binding site, so the consensus sequence does not 
appear in the mature transcript. 


Archaeal Transcription 


The transcription machinery of archaea is distinct from 
that of bacteria and represents a simplified and ances- 
trally related version of the eukaryotic apparatus that is 
most similar to the RNA pol II holoenzyme. While bac- 
terial transcription utilizes different sigma subunits to 
alter core polymerase specificity for distinct promoters, 
eukaryotes use a group of general transcription factors 
to facilitate the recognition of promoter consensus 
sequences. In the case of the eukaryotic RNA polymerase 
II holoenzyme, six general transcription factors are re- 
cruited to the promoter. Archaeal transcription follows 
the eukaryotic model, using three proteins homologous 
to eukaryotic transcription factors to identify two pro- 
moter consensus regions. 

Studies examining archaeal promoters and tran- 
scription initiation in the thermophilic archaeal species 
Sulfolobus shibatae have identified a TATA-binding pro- 
tein (TBP, a subunit of TFIID) and transcription factor 
B (TFB), a homolog of eukaryotic TFIIB, as the only 
proteins required for interaction with RNA polymerase 
in the initiation of archaeal transcription (Figure 8.16). 
TBP binds to a TATA box in the archaeal promoter, and 
TFB binds a BRE box (TFB-recognition element) that is 
immediately upstream of the TATA box. With TBP and 
TFB bound to their promoter elements, RNA polymerase 
is directed approximately 25 base pairs downstream to 
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Figure 8.16 Archaea promoter consensus sequences. The 
TATA box and BRE box sequences bind TBP and TFB along with 
RNA polymerase to initiate transcription. 


8.4 Post-Transcriptional Processing Modifies RNA Molecules 285 


the transcription start site. A third component, TFIIE[ a], 
a homolog of the eukaryotic GTP TFIIE, is not always 
required for transcription, but it enhances TATA box 
binding, thereby stimulating transcription. 


8.4 Post-Transcriptional Processing 
Modifies RNA Molecules 


Bacterial, archaeal, and eukaryotic transcripts differ in sev- 
eral ways. For example, eukaryotic transcripts are more 
stable than bacterial and archaeal transcripts. The half-life 
of a typical eukaryotic mRNA is measured in hours to 
days, whereas bacterial mRNAs have an average half-life 
measured in seconds to minutes. A second difference is the 
separation, in time and in location, between transcription 
and translation. Recall that in bacteria the lack of a nucleus 
leads to coupling of transcription and translation. Similarly, 
archaea lack a nucleus, leading to the possibility of syn- 
chrony between transcription and translation. In eukary- 
otic cells, on the other hand, transcription takes place in 
the nucleus, and translation occurs later at free ribosomes 
or at those attached to the rough endoplasmic reticulum in 
the cytoplasm. A third difference is the presence of introns 
in eukaryotic genes that are absent from most bacterial and 
archaeal genes. Each of these differences comes into play as 
we consider post-transcriptional modifications of mRNA 
in eukaryotic cells, which is the focus of this section. 

In discussing post-transcriptional processing, we 
highlight three processing steps that are coordinated 
during transcription to modify the initial eukaryotic 
gene mRNA transcript, called pre-mRNA, into mature 
mRNA, the fully processed mRNA that migrates out 
of the nucleus to the cytoplasm for translation. These 
modification steps are (1) 5’ capping, the addition of a 
modified nucleotide to the 5’ end of mRNA; (2) 3’ poly- 
adenylation, cleavage at the 3’ end of mRNA and addi- 
tion of a tail of multiple adenines to form the poly-A tail; 
and (3) intron splicing, RNA splicing to remove introns 
and ligate exons. We conclude the section with a discus- 
sion of the mechanisms directing alternative splicing and 
self-splicing RNAs. 


Capping 5’ mRNA 


After RNA pol II has synthesized the first 20 to 30 nu- 
cleotides of the mRNA transcript, a specialized enzyme, 
guanylyl transferase, adds a guanine to the 5’ end of the 
pre-mRNA, producing an unusual 5’-to-5’ bond that 
forms a triphosphate linkage. Additional enzymatic action 
then methylates the newly added guanine and may also 
methylate the next one or more nucleotides of the tran- 
script. This addition of guanine to the transcript and the 
subsequent methylation is known as 5’ capping. 

Guanylyl transferase initiates 5’ capping in three steps 
depicted in Figure 8.17. Before capping, the terminal 5’ 
nucleotide of mRNA contains three phosphate groups, 
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Figure 8.17 Capping the 5’ end of eukaryotic pre-mRNA. 


labeled a, P, and y in Figure 8.17. Guanylyl transferase first 
removes the y phosphate, leaving two phosphates on the 5’ 
terminal nucleotide @. The guanine triphosphate contain- 
ing the guanine that is to be added loses two phosphates 
(y and P) to form a guanine monophosphate @. Then, 
guanylyl transferase joins the guanine monophosphate 
to the mRNA terminal nucleotide to form the 5’-to-5’ 
triphosphate linkage ©. Methyl transferase enzyme then 
adds a methyl (CH3) group to the 7-nitrogen of the new 
guanine, forming 7-methylguanosine (m’G). Methyl trans- 
ferase may also add methyl groups to 2’-OH of nearby 
nucleotides of mRNA. 

The 5’ cap has several functions, including (1) pro- 
tecting mRNA from rapid degradation, (2) facilitating 
mRNA transport across the nuclear membrane, (3) facili- 
tating subsequent intron splicing, and (4) enhancing trans- 
lation efficiency by orienting the ribosome on mRNA. 


Polyadenylation of 3’ Pre-mRNA 


Termination of transcription by RNA pol II is not fully 
understood, but it appears likely to be tied to the pro- 
cessing and polyadenylation of the 3’ end of pre-mRNA. 
It is clear that the 3’ end of mRNA is not generated by 
transcriptional termination. Rather, the 3’ end of the pre- 
mRNA is created by enzymatic action that removes a seg- 
ment from the 3’ end of the transcript and replaces it with 
a string of adenine nucleotides, the poly-A tail. This step 
of pre-mRNA processing is thought to be associated with 
subsequent termination of transcription. 


Figure 8.18 illustrates these steps. Polyadenylation be- 
gins with the binding of a factor called cleavage and poly- 
adenylation specificity factor (CPSF) near a six-nucleotide 
mRNA sequence, AAUAAA, that is downstream of the stop 
codon and thus not part of the coding sequence of the 
gene. This six-nucleotide sequence is known as the poly- 
adenylation signal sequence. The binding of cleavage- 
stimulating factor (CStF) to a uracil-rich sequence several 
dozen nucleotides downstream of the polyadenylation 
signal sequence quickly follows, and the binding of two 
other cleavage factors, CFI and CFII, and polyadenylate 
polymerase (PAP) enlarges the complex @. The pre- 
mRNA is then cleaved 15 to 30 nucleotides downstream 
of the polyadenylation signal sequence @. The cleavage 
releases a transcript fragment bound by CFI, CFII, and 
CStF, which is later degraded ©. The 3’ end of the cut 
pre-mRNA then undergoes the enzymatic addition of 20 
to 200 adenine nucleotides that form the 3’ poly-A tail 
through the action of CPSF and PAP @. After addition of 
the first 10 adenines, molecules of poly-A-binding protein 
II (PABII) join the elongating poly-A tail and increase the 
rate of adenine addition @. The 3’ poly-A tail has sev- 
eral functions, including (1) facilitating transport of ma- 
ture mRNA across the nuclear membrane, (2) protecting 
mRNA from degradation, and (3) enhancing translation 
by enabling ribosomal recognition of messenger RNA. 

Certain eukaryotic mRNA transcripts do not undergo 
polyadenylation. The most prominent of these are tran- 
scripts of genes producing histone proteins, which are key 
components of chromatin, the DNA-protein complex 
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Figure 8.18 Polyadenylation of the 3’ end of eukaryotic pre-mRNA. 


that makes up eukaryotic chromosomes (see Chapter 11). 
On these and other “tailless” mRNAs, the 3’ end contains 
a short stem-loop structure reminiscent of the ones seen 
in the intrinsic transcription termination mechanism of 
bacteria. There may be an evolutionary connection be- 
tween bacterial transcription termination and stem-loop 
formation on “tailless” eukaryotic mRNAs. 


The Torpedo Model of Transcription 
Termination 


The connection between polyadenylation and transcription 
termination lies in the activity of a specialized RNase (an 
RNA-destroying enzyme) that attacks and digests the resid- 
ual RNA transcript attached to RNA pol II after 3’ transcript 
cleavage (Figure 8.19). Following polyadenylation and 3’ 


cleavage, the residual segment of the transcript still attached 
to RNA pol II is not capped at its 5’ end. This end is attacked 
by the specialized RNase that rapidly digests the remaining 
transcript. The RNase is thought of as a “torpedo” aimed 
at the residual mRNA attached to RNA pol II. Studies have 
shown that the torpedo RNase is a highly processive enzyme, 
meaning that it rapidly carries out its enzymatic action. 
Once the RNase destroys the residual mRNA and catches up 
to RNA pol II, it triggers dissociation of the polymerase from 
template strand DNA to terminate transcription. 


Pre-mRNA Intron Splicing 


The third step of pre-mRNA processing is intron splic- 
ing, which consists of removing intron segments from 
pre-mRNA and ligating the exons. Intron splicing requires 
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Figure 8.19 The torpedo model of eukaryotic transcription termination. O 


Eukaryotic transcription @ leads to 3’ cleavage near the poly-A signal sequence 
@ which releases mature mRNA. The torpedo RNase attacks the uncapped 5’ 
end of the residual mRNA @and digests it @, leading to the dissociation of RNA 


polymerase Il and the torpedo RNase @. 


Poly-A signal 
1) sequence 


Torpedo RNase 


marema” ON I a 


exquisite precision to remove all intron nucleotides accu- 
rately without intruding on the exons, and without leaving 
behind additional nucleotides, so that the mRNA sequence 
encoded by the ligated exons will completely and faithfully 
direct synthesis of the correct polypeptide. As an example 
of the need for precision in intron removal, consider the 
following “precursor string” made up of exon-like blocks of 
letters forming three-letter words interrupted by unintel- 
ligible intron-like blocks of letters. If editing removes the 
“introns” accurately, the “edited string” can be divided into 
its three-letter words to form a “sentence.” If an error in 
editing were to remove too many or too few letters, a non- 
sense sentence would result. 

The finding that introns interrupt the genetically 
informative segments of eukaryotic genes was a stun- 
ning discovery reported independently by the molec- 
ular biologists Richard Roberts and Phillip Sharp in 
1977. Nothing known about eukaryotic gene structure 
at the time suggested that most eukaryotic genes are 
subdivided into intron and exon elements. Roberts and 
Sharp shared the 1993 Nobel Prize in Physiology or 
Medicine for their codiscovery of “split genes” in the 
eukaryotic genome. 

Sharp’s research group discovered the split nature of 
eukaryotic genes by using a technique known as R-looping. 
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In this method, DNA encoding a gene is isolated, denatured 
to single-stranded form, and then mixed with the mature 
mRNA transcript from the gene. Regions of the gene that 
encode sequences in mature mRNA will be complemen- 
tary to those sequences in the mRNA and will hybridize 
with them to form a DNA—~mRNA duplex. However, DNA 
segments encoding introns will not find complementary se- 
quences in mature mRNA and will remain single-stranded, 
looping out from between the hybridized sequences. 

Figure 8.20 shows a map of the exon gene studied in 
R-looping experiments by Sharp and colleagues. The ex- 
perimental results, photographed by electron microscopy, 
reveal four DNA-mRNA hybrid regions where exon 
DNA sequence pairs with mature mRNA sequence. Three 
single-stranded R-loop sequences are introns which do 
not pair with mRNA. 


Splicing Signal Sequences 


Eukaryotic pre-mRNA contains specific short sequences 
that define the 5’ and 3’ junctions between introns and 
their neighboring exons. In addition, there is a consen- 
sus sequence near each intron end to assist in its accu- 
rate identification. The 5’ splice site is located at the 5’ 
intron end, where it abuts an exon (Figure 8.21). This site 


intron intron 
Es .a %.s Of 


youmaynoxpghrcyeomt pwtipthepfxwubij rdlzmcolzotandsipthetea 
youmaynowtipthepotandsipthetea 


you may now tip the pot and sip the tea 


(a) 


Introns A B C 
Hexon 
Exons 1 4 gene 
(b) 
5 G 


Exons 


Figure 8.20 R-loop experimental analysis. (a) The hexon 
gene contains four exons (1 to 4) and three introns (A to C). 
(b) Electron micrographs show hybridization of mature 
mRNA with exon sequences of denatured hexon DNA. Intron 
sequences are not hybridized and remain single stranded. 


contains a consensus sequence with a nearly invariant GU 
dinucleotide forming the 5'-most end of the intron. The 
consensus sequence includes the last three nucleotides of 
the adjoining exon, as well as the four or five nucleotides 
that follow the GU in the intron. At the 3’ splice site on 
the opposite end of the intron, a consensus sequence of 
11 nucleotides contains a pyrimidine-rich region and a 
nearly invariant AG dinucleotide at the 3'-most end of the 
intron. The third consensus sequence, called the branch 
site, is located 20 to 40 nucleotides upstream of the 3’ 
splice site. This consensus sequence is pyrimidine-rich 
and contains an invariant adenine, called the branch 
point adenine, near the 3’ end. 

Mutation analysis shows that these consensus se- 
quences are critical for accurate intron removal. Mutations 
altering nucleotides in any of the three consensus re- 
gions can produce abnormally spliced mature mRNA. 
The abnormal mRNAs—too short if exon sequence is 
mistakenly removed, too long if intron sequence is left be- 
hind, or altered in other ways that result in improper read- 
ing of mRNA sequence—produce proteins with incorrect 
sequences of amino acids (see Chapter 12). 

Introns are removed from pre-mRNA by an snRNA- 
protein complex called the spliceosome. The spliceosome 
is something like a molecular workbench to which pre- 
mRNA is attached while spliceosome subunit components 
cut and splice it in a four-step process that, first, cleaves 
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the 5’ splice site; second, forms a lariat intron structure 
that binds the 5’ intron end to the branch point adenine; 
third, cleaves the 3’ splice site; and finally, ligates exons and 
releases the lariat intron to be degraded to its nucleotide 
components. An electron micrograph of a spliceosome in 
action is seen in the opener photo for this chapter. 

Figure 8.21 illustrates the steps of nuclear pre-mRNA 
splicing, beginning with the aggregation of five small nu- 
clear ribonucleoproteins (snRNPs; pronounced “snurps”) 
to form a spliceosome. The snRNPs are snRNA-protein 
subunits designated U1 to U6. The spliceosome is a large 
complex made up of multiple snRNPs, but its composi- 
tion is dynamic; it changes throughout the different stages 
of splicing when individual snRNPs come and go as par- 
ticular reaction steps are carried out. 


Coupling of Pre-mRNA Processing Steps 


Each intron-exon junction is subjected to the same spli- 
ceosome reactions, raising the question of whether there 
is a particular order in which introns are removed from 
pre-mRNA—or whether U1 and U2 search more or less 
randomly for 5’ splice-site and branch-site consensus se- 
quences, inducing spliceosome formation when they hap- 
pen to encounter an intron. The answer is that introns 
appear to be removed one by one, but not necessarily in 
order along the pre-mRNA. For example, a study of intron 
splicing of the mammalian ovomucoid gene demonstrates 
the successive steps of intron removal. The ovomucoid gene 
contains eight exons and seven introns. The pre-mRNA 
transcript is approximately 5.6 kb, and the mature mRNA is 
reduced to 1.1 kb. Northern blot analysis of ovomucoid pre- 
mRNAs at various stages of intron removal illustrates that 
each intron is removed separately, rather than all introns 
being removed at once. The order of intron removal does 
not precisely match their 5'-to-3’ order in pre-mRNA. 

The three steps of pre-mRNA processing are tightly 
coupled. In comprehensive models developed over the last 
decade or so, the carboxyl terminal domain (CTD) of RNA 
polymerase II plays an important role in this coupling by 
functioning as an assembly platform and regulator of pre- 
mRNA processing machinery. The CTD is located at the site 
of emergence of mRNA from the polymerase and contains 
multiple heptad (seven-member) repeats of amino acids that 
can be phosphorylated. Binding of processing proteins to the 
CTD allows the mRNA to be modified as it is transcribed. 

Current models propose that “gene expression ma- 
chines” consisting of RNA polymerase II and an array of pre- 
mRNA-processing proteins are responsible for the coupling 
of transcription and pre-mRNA processing. Foundation 
Figure 8.22 illustrates this gene expression machine model. 
The CTD of RNA polymerase II associates with multiple 
proteins that carry out capping (CAP), intron splicing (SF), 
and polyadenylation (pA) so that the processes of transcrip- 
tion and pre-mRNA processing occur simultaneously. At 
the initiation of transcription, phosphorylation (P) along 
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Figure 8.21 Intron splicing in eukaryotic 
pre-mRNA. Spliceosome assembly and intron 
removal. 


Ma nl b: 


(1) snRNP U1 binds 5'splice site, Branch point 20-40 
and U2 binds branch site. adenine nucleotides 
1 
a AG amy fm 3’ 
© snRNPs U4, U5, and U6 bind to 
complex and form the 
inactive spliceosome. A lariat y 
intron structure forms. A Exon1 
Lariat Exon 2 
intron k3 


© U4 dissociates to form the 
active spliceosome, followed a 
by 5’cleavage and formation -7 


of a 2'-5' phosphodiester bond , gen 
to stabilize lariat intron. ey Pa & 


Lariat 
intron 


@ The 3’end of the intron 
is cleaved, leaving a 5’ 
monophosphate at the 
5’exon end. 


Lariat 
intron 


Exon1 Exon 2 


5’ M AONI JA 3’ 


| 


Further splicing 


Lariet 
intron 


Degradation 


(6) Cleavage frees the lariat intron, and the exons are ligated. 


the CTD assists the binding of 5'-capping enzymes, which 
carry out their capping function and then dissociate. During 
transcription elongation, specific transcription elongation 
factors bind the CTD and facilitate splicing-factor binding. 
The CTD also contains the torpedo RNase responsible for 
digestion of the residual transcript left attached to RNA 
pol II by 3’ cleavage linked to polyadenylation. The tor- 
pedo RNase is loaded onto the transcript from the CTD to 
quickly trigger transcription termination. 
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Alternative Transcripts of Single Genes 


Before the complete sequencing of the human genome in 
the early 2000s, estimates of the number of human genes 
varied, having been as high as 80,000 to 100,000 genes 
20 years or so earlier. A principal reason for this prediction 
was that human cells produce well over 100,000 distinct 
polypeptides. It came as something of a surprise, then, when 
gene annotation of the human genome revealed a total 
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content of approximately 22,800 genes. The difference be- 
tween the number of genes and the number of polypeptides 
is mirrored by similar findings in other eukaryotic genomes, 
especially those of mammals. It is common for large eukary- 
otic genomes to express more proteins than there are genes 
in the genomes. Three transcription-associated mecha- 
nisms can account for the ability of single DNA sequences 
to produce more than one polypeptide: (1) pre-mRNA 
can be spliced in alternative patterns in different types of 
cells; (2) alternative promoters can initiate transcription at 
distinct +1 start points in different cell types; and (3) alter- 
native locations of polyadenylation can produce different 
mature mRNAs. Collectively, these varied processes are 
identified as alternative pre-mRNA processing. 
Alternative intron splicing is the mechanism by 
which post-transcriptional processing of identical pre- 
mRNAs in different cells can lead to mature mRNAs 


Figure 8.23 Alternative splicing. (a) The (a) 
calcitonin/calcitonin gene-related protein 

(CT/CGRP) gene is transcribed into either 

calcitonin or CGRP. (b) Dscam pre-mRNA 

contains numerous alternatives for exons 4, 6, 9, 

and 17. Combinatorial splicing could generate as 
many as 38,016 different mature mRNAs. 
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with different combinations of exons. These alternative 
mature mRNAs produce different polypeptides. In other 
words, alternative splicing is a mechanism by which a 
single DNA sequence can produce more than one specific 
protein. Alternative splicing is common in mammals— 
approximately 70 percent of human genes are thought 
to undergo alternative splicing—but it is less common in 
other animals, and it is rare in plants. 

The products of the human calcitonin/calcitonin gene- 
related peptide (CT/CGRP) gene exemplify the process of 
alternative splicing (Figure 8.23a). The CT/CGRP gene pro- 
duces the same pre-mRNA transcript in many cells, includ- 
ing thyroid cells and neuronal cells. The transcript contains 
six exons and five introns and includes two alternative poly- 
adenylation sites, one in exon 4 and the other following exon 
6. In thyroid cells, CT/CGRP pre-mRNA is spliced to form 
mature mRNA containing exons 1 through 4, using the first 
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poly-A site for polyadenylation. Translation produces calci- 
tonin, a hormone that helps regulate calcium. In neuronal 
cells, the same pre-mRNA is spliced to form mature mRNA 
containing exons 1, 2, 3, 5, and 6. Polyadenylation takes 
place at the site that follows exon 6, since exon 4 is spliced 
out as though it were an intron. Translation in neuronal 
cells produces the hormone CGRP. 

One of the most complex patterns of alternative splic- 
ing occurs in the Drosophila Dscam gene, which produces a 
protein directing axon growth in Drosophila larvae. Mature 
mRNA from Dscam contains 24 exons, but as shown in 
Figure 8.23b, numerous alternative sequences can be used 
as exons 4, 6, 9, and 17. In total, more than 38,000 different 
alternative splicing arrangements of Dscam are possible, 
although not all are observed in the organism. 

The use of alternative promoters occurs when 
more than one sequence upstream of a gene can bind 
transcription factors and initiate transcription. Similarly, 
alternative polyadenylation is possible when genes con- 
tain more than one polyadenylation signal sequence that 
can activate 3’ pre-mRNA cleavage and polyadenylation. 
Alternative promoters and alternative polyadenylation 
are driven by the variable expression of transcriptional 
or polyadenylation proteins in a cell-type-specific man- 
ner. The variable expression of transcriptional and poly- 
adenylation proteins generates characteristic mature 
mRNAs from specific genes in particular cells. The result 
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is that transcription of a single gene may lead to the pro- 
duction of several different mature mRNAs in different 
types of cells, and to their translation into distinct pro- 
teins in each of those cell types. 

A comprehensive example of a single gene for which 
all three alternative mechanisms operate to produce 
distinct polypeptides in different cells is that of the rat 
a-tropomyosin (a-Tm) gene that produces nine differ- 
ent mature mRNAs and, correspondingly, nine different 
tropomyosin proteins from a single gene. Figure 8.24a 
shows a map of a-Tm. The gene contains 14 exons, includ- 
ing alternatives for exons 1, 2, 6, and 9. The gene has two 
promoters (identified as P; and P3) as well as five alterna- 
tive polyadenylation sites (identified as A, to As). The 
nine distinct mature mRNAs from a-Tm are produced 
in muscle cells (two forms), brain cells (three forms), and 
fibroblast cells (four forms); see Figure 8.24b. Each different 
mature mRNA illustrates a unique pattern of promoter se- 
lection, intron splicing, and choice of polyadenylation site. 
All mature mRNAs, and their corresponding tropomyosin 
proteins, contain the genetic information of exons 3, 4, 5, 7, 
and 8; however, they may contain distinct information in 
the alternative exons that depends largely on the cell-type- 
specific selection of promoter and polyadenylation site. 

In striated muscle cells, for example, promoter P; and 
polyadenylation site Aj are used. The mature mRNA in- 
cludes the alternative exons 1a, 2b, 6b, 9a, and 9b. In contrast, 
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tropomyosin in smooth muscle cells utilizes promoter Pı 
and polyadenylation site As, and its mature mRNA contains 
exons la, 2a, 6b, and 9d. Brain cells produce three different 
tropomyosin proteins, each of which are translated from 
differentially spliced pre-mRNAs that also utilize different 
polyadenylation sites. In addition, two forms of the brain cell 
tropomyosin proteins are translated from mRNAs that utilize 
promoter P, and one from an mRNA utilizing P4}. Among 
the four different tropomyosin proteins produced in fibro- 
blasts, the mRNAs all use polyadenylation site As, but they 
differ in selection of P4 versus Py, and alternative splicing oc- 
curs as well. Genetic Analysis 8.2 guides you through analysis 
of the results of alternative mRNA processing. 


Control of Alternative Splicing 


We have seen that specific RNA sequences at 5’ and 3’ 
splice sites are crucial to accurate pre-mRNA splicing and 
that alternative splicing is widespread in many genomes, 
with some genes having a large number of alternative pro- 
tein products from different splicing patterns of pre-mRNA. 

Obviously, alternative splicing is carefully controlled 
in cells, but what mechanisms are involved in that control? 
The answer appears to be specific sequences in exons and 
in introns that bind splicing proteins to either enhance or 
suppress splicing at nearby splice sites. The sequences are 
identified as exonic or intronic splicing enhancers (ESE 
or ISE) and exonic or intronic splicing silencers (ESS or 
ISS). ESE and ISE sequences, for example, attract protein 
rich in serine and arginine (one-letter abbreviations S and 
R, respectively) called SR proteins (Figure 8.25). SR proteins 
direct spliceosome activity to nearby splice sites. These 
proteins are the products of a large and diverse family of 
genes, and differential gene transcription in cells is key to 
SR-protein control of different splicing patterns. 

ESS and ISS sequences seem to work in a manner 
similar to that of splice enhancer sequences, attract- 
ing splice repressor proteins that prevent splicing using 
nearby splice sites. Current evidence indicates that these 
splice repressor proteins are members of a diverse group 
of heterogeneous nuclear ribonucleoproteins (hnRNPs). 
Binding of hnRNPs to ESS or ISS sequences blocks the ac- 
tion of the spliceosome at nearby splice sites. 
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Figure 8.25 SR-protein recruitment to ESEs, directing 
spliceosome components to nearby splice sites. Binding of SR 
protein to ISEs has a similar result. In contrast, protein binding 
to ESS and ISS elements blocks nearby spliceosome binding. 


Intron Self-Splicing 


In addition to introns that are spliced by spliceosomes, 
certain other RNAs can contain introns that self-catalyze 
their own removal. Three categories of self-splicing in- 
trons, designated group I, group II, and group III introns, 
have been identified. The molecular biologist Thomas 
Cech and his colleagues discovered group I introns in 
1981, when they observed that a 413-nucleotide precursor 
mRNA of an rRNA gene from the protozoan Tetrahymena 
could splice itself without the presence of any protein. 
Following up on this initial observation, Cech and others 
have shown that group I introns are large, self-splicing 
ribozymes (catalytically active RNAs) that catalyze their 
own excision from certain mRNAs and also from tRNA 
and rRNA precursors in bacteria, simple eukaryotes, and 
plants. Intron self-splicing takes place by way of two 
transesterification reactions (Figure 8.26 @, ©) that excise 
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Figure 8.26 Self-splicing of group | introns. 
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PROBLEM The JLB-1 gene, expressed in several human organs, contains seven exons (1 to 7) and six 
introns (A to F). Three oligonucleotide probes (I to Ill), hybridizing to exons 2, 4, and 7, respectively, are 
indicated by asterisks below the gene map: 
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Mature mRNA is isolated from three tissues expressing the JLB-1 gene and ex- 


Liver 


Kidney 


amined by northern blotting using the three oligonucleotide probes indicated 
above. The probes bind to complementary sequences in mRNA. Northern 
blot patterns of hybridization between each probe and mRNA isolated 


Probe | 


from blood, liver, and kidney cells are shown. For each northern blot: 


a. Explain the meaning of the hybrid- 
ization result. 

b. Identify the biological process 
or processes accounting for the 
observed patterns of hybridization 
in the northern blots. 


BREAK IT DOWN: Molecular probes bind 
only to their target sequences. A band appears in 
the northern blot only if the exon target of a probe 
is present in the mRNA (p. 289, See also p. 349). 


Probe II 


BREAK IT DOWN: Differences in the results for 
different tissues indicate the presence of alternative 
transcripts of the gene (p. 289, see also p. 349). 


Probe Ill 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem addresses 
and the nature of the required answer. 


2. Identify the critical information provided 
in the problem. 


1. This problem concerns the production of mature mRNAs from a single human 
gene expressed in different organs. The answer requires identification of the 
specific mechanisms responsible for the data obtained from each organ. 

2. The problem gives gene structure, the binding location of each of three 
molecular probes hybridizing the gene, and the results of three northern blot 
analyses of mature mRNA from different organs. 


Deduce 


3. Identify the regions of JLB-1 that are 
anticipated to be part of the pre-mRNA. 


4. Identify the regions expected to be found 
in mature MRNA. 


3. Pre-mRNA from this gene is anticipated to include all intron and exon 
sequences. 

4. Exon segments are expected in mature mRNA, along with modification at the 
5’ mRNA end (capping) and the 3’ end (poly-A tailing). 


Solve 


5. Determine the hybridization pattern 
a era ae rae, 
of molecular probes in each tissue. 


\ 


TIP: Hybridization of a probe occurs when the 
probe finds its target sequence. The absence of 
hybridization indicates that the target sequence 
for a probe is not present. 


6. Interpret the hybridization patterns in 
each tissue and identify the process or 
processes that reasonably account for the 
observed patterns. 


\ 


TIP: Alternative promoters, alternative polyadenylation sites, 
and alternative splicing are three mechanisms that lead eukaryotic 
genomes to generate distinct proteins from the same gene. 


For more practice, see Problems 2, 3, and 8. 


Answer a 

5. Blood: Probes | and II hybridize, but probe III does not. This result indicates that 
exons 2 and 4 are present in the mature mRNA of blood, but exon 7 is not. 
Liver: Probe | fails to hybridize to mRNA from liver, indicating that exon 1 is 
missing from liver mRNA. Probes II and III hybridize liver MRNA, indicating that 
exons 4 and 7 are included in the mature transcript. 
Kidney: Probe II does not hybridize kidney mRNA, indicating that exon 4 is 
missing from it. Probes | and III find hybridization targets, indicating that exons 
2 and 7 are present in the transcript. 


Answer b 

6. Blood: The absence of exon 7 is most likely due to either the use of an alternative 
polyadenylation site that generates 3’ cleavage of pre-mRNA ahead of exon 7 or 
to differential splicing that removes exon 7 from pre-mRNA during intron splicing. 
Liver: The absence of exon 2 is most likely due either to use of an alternative 
promoter that initiates transcription at a point past exon 2 or to differential splicing 
of liver pre-mRNA. 
Kidney: The absence of exon 4 is most likely the result of differential splicing of 
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the intron and allow exons to ligate ©. Cech and Sidney 
Altman shared the 1989 Nobel Prize in Physiology or 
Medicine for their contributions to the discovery and 
description of the catalytic properties of RNA. 

Group II introns, which are also self-splicing ribo- 
zymes, are found in mRNA, tRNA, and rRNA of fungi, 
plants, protists, and bacteria. Group II introns form highly 
complex secondary structures containing many stem-loop 
arrangements. Their self-splicing takes place in a lariat-like 
manner utilizing a branch point nucleotide that in many 
cases is adenine. It is thought that nuclear pre-mRNA splic- 
ing may have evolved from group II self-splicing introns. 

Group III introns and group II introns are similar 
in having elaborate secondary structures and lariat-like 
splicing structures that utilize a branch point nucleotide. 
Group III introns are much shorter than group II introns, 
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however, and their secondary structures are different 
from those of group II introns. 


Ribosomal RNA Processing 


In bacteria, archaea, and eukaryotes, rRNAs are tran- 
scribed as large precursor molecules that are cleaved 
into smaller RNA molecules by removal and discarding 
of spacer sequences intervening between the sequences 
of the different RNAs. The E. coli genome, for example, 
contains seven copies of an rRNA gene. Each gene copy is 
transcribed into a single 30S precursor RNA that is pro- 
cessed by the removal of intervening sequences to yield 
5S, 16S, and 23S rRNAs, along with several tRNA mol- 
ecules (Figure 8.27a). All seven gene copies produce the 
same three rRNAs, but each gene generates a different 


Figure 8.27 The processing of (a) E. coli 
ribosomal and transfer RNA. (a) A large RNA-coding gene 
transcript is cleaved to produce rRNA DNA 5’ a a M a E 3 
and tRNA in £. coli. (b) Human rRNA 16S tRNA 23S rRNA 55 tRNA 
genes are part of a 40-kb repeating rRNA rRNA 
sequence that produces three rRNAs. @ Transcription produces 
a 30S pre-RNA. 
pre-RNA 5’ ME [sep juni 3° | 30S pre-RNA 
16S tRNA 23S rRNA 55 tRNA transcript 
rRNA rRNA 
(2) RNA cleavage releases 
rRNAs and tRNAs. 
+ + Ribosomal RNA 
16S 23S 5S 
and 
E + p] | Transfer RNA 
tRNA tRNA 
(b) Human 


rRNA transcriptional unit, 13 kb Intergenic spacer 
| ~27 kb | 


ETS ITS1 ITS2 
DNA 5' | E | | II 3 
18S 5.88 28S 
(1) Transcription synthesizes 
a 45S pre-rRNA transcript. 
Pre-rRNA 5’ m| O B ~ EJ | 45S pre-rRNA transcript 
18S 5.8S 28S 
@ Pre-RNA cleavage 
produces three rRNAs. 


5.8S 


i Ribosomal RNA 
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set of tRNAs. There is evidence that archaea use a similar 
process to produce some rRNA molecules. 

Eukaryotic genomes have hundreds of rRNA 
genes clustered in regions of repeated genes on vari- 
ous chromosomes. Each gene produces a 45S precursor 
rRNA that contains an external transcription sequence 
(ETS) and two internal transcription sequences (ITS1 and 
ITS2) that are removed by processing. The transcript is 
processed in multiple steps to yield three rRNA molecules 
weighing 5.85, 18S, and 28S (Figure 8.27b). Eukaryotic 
genomes differ somewhat in the steps that process the 
45S pre-rRNA transcript. In general, however, the 45S 
transcript is cleaved to a 41S intermediate from which 
the 18S transcript is then removed, followed by cleavage 
that produces the 28S and 5.8S transcripts. The 5.88 and 
28S products pair with one another and become part of 
the same ribosomal subunit. After processing, the result- 
ing rRNAs fold into complex secondary structures and 
are joined by proteins to form ribosomal subunits. Some 
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chemical modifications of rRNA, particularly methylation 
of selected nucleotide bases, occur after completion of 
transcription. 


Transfer RNA Processing 


The production of tRNA, whether in bacteria, archaea, 
or eukaryotes, also requires post-transcriptional pro- 
cessing. Each type of tRNA has distinctive nucleotides 
and a specific pattern of folding, but all tRNAs have 
similar structures and functions (Figure 8.28). Some bac- 
terial transfer RNA molecules are produced simultane- 
ously with rRNAs, as described above (see Figure 8.27a). 
Other tRNAs are transcribed as part of a large pre-tRNA 
transcript that is then cleaved to yield multiple tRNA 
molecules. In eukaryotes, tRNA genes occur in clusters 
on specific chromosomes. Each eukaryotic tRNA gene 
is individually transcribed by RNA polymerase III, and a 
single pre-tRNA is produced from each gene. 
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Figure 8.28 Transfer RNA structure. Each tRNA has a distinctive structure. The tRNA carrying 


alanine is illustrated in two-dimensions (a) and three-dimensions (b). 
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The number of different tRNAs produced depends on 
the type of organism. In bacteria, the exact number of dif- 
ferent tRNAs varies, but it is usually substantially less than 
61, the number of codons found in mRNA. At a minimum, 
each species must have at least 20 different tRNAs, one for 
each amino acid, but most produce at least 30 to 40 different 
tRNAs. The low number of different tRNAs (compared to 
number of codons) results from a phenomenon called third- 
base wobble, a relaxation of the “rules” of complementary 
base pairing at the third base of codons (see Chapter 9). 
Although third-base wobble plays a role in reducing the 
number of distinct tRNA genes needed in eukaryotic ge- 
nomes, eukaryotes nevertheless produce a larger number of 
different tRNAs than bacteria do. Some eukaryotic genomes 
contain a full complement of 61 different tRNA genes, one 
corresponding to each codon of the genetic code. 

Bacterial tRNAs require processing before they are 
ready to assume their functional role of transporting amino 
acids to the ribosome. The precise processing events differ 
somewhat among tRNAs, but several features are com- 
mon. First, many tRNAs are cleaved from large precursor 
tRNA transcripts to produce several individual tRNA mol- 
ecules. Second, nucleotides are trimmed off the 5’ and 3’ 
ends of tRNA transcripts to prepare the mature molecule. 
Third, certain individual nucleotides in different tRNAs 
are chemically modified to produce a distinctive molecule. 
Fourth, tRNAs fold into a precise three-dimensional struc- 
ture that includes four double-stranded stems, three of 
which are capped by single-stranded loops; each stem and 
loop constitutes an “arm” of the tRNA molecule. Fifth, 
tRNAs undergo post-transcriptional addition of bases. The 
most common addition is three nucleotides, CCA, at the 3’ 
end of the molecule. This region is the binding site for the 
amino acid the tRNA molecule transports to the ribosome. 
Figure 8.28 shows tRNAaj,, which carries alanine. The 
CCA terminus is indicated, along with chemically modi- 
fied nucleotides in each arm that are characteristic of this 
tRNA. Both a two-dimensional and a three-dimensional 
representation are shown. 

Eukaryotic and archaeal tRNAs undergo processing 
modifications similar to those of bacterial tRNAs. In addi- 
tion, however, eukaryotic pre-tRNAs may contain small in- 
trons that are removed during processing. For example, an 
intron 14 nucleotides in length is removed from the precur- 
sor molecule by a specialized nuclease enzyme that cleaves 
the 5’ and 3’ splice sites of tRNA introns. The cleaved 
tRNA then refolds to form the anticodon stem, and the en- 
zyme RNA ligase joins the 5’ and 3’ ends of the tRNA. 


Post-Transcriptional RNA Editing 


A firmly established tenet in the central dogma of biol- 
ogy is the role of DNA as the repository and purveyor of 
genetic information. Notwithstanding the modifications 
made to precursor RNA transcripts after transcription, a 
fundamental principle of biology is that DNA dictates the 


sequence of mRNA nucleotides and controls the order 
of amino acids in proteins. And yet, in the mid-1980s, a 
phenomenon called RNA editing was uncovered that is 
responsible for post-transcriptional modifications that 
change the genetic information carried by mRNA. 

Two kinds of RNA editing occur. In one kind of 
RNA editing, uracils are inserted into edited mRNA with 
the assistance of a specialized RNA called guide RNA 
(gRNA). A guide RNA, transcribed from a separate RNA- 
encoding gene, contains a sequence complementary to 
the region of mRNA that it edits. With the aid of a protein 
complex, a portion of guide RNA pairs with comple- 
mentary nucleotides of pre-edited mRNA and acts as a 
template to direct the insertion (and occasionally the de- 
letion) of uracil (Figure 8.29). Guide RNA releases edited 
mRNA after editing is complete. The protein translated 
from edited mRNA may differ from the protein produced 
from unedited transcript. 

The second kind of RNA editing is by base substitu- 
tion, and frequently consists of the replacement of cyto- 
sine with uracil (C-to-U editing) in mRNA by removal of 
the amino group from cytosines. We describe the details 
of this process, known as deamination, in Section 12.3 
and here simply examine the consequences of the event. 
This type of RNA editing has been identified in mammals, 
most land plants, and several single-celled eukaryotes. 
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Figure 8.29 Guide RNA (gRNA) directs RNA editing. 
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The consequence of C-to-U RNA editing is dem- 
onstrated by the protein products of the mammalian 
apolipoprotein B gene (Figure 8.30). An identical gene 
containing 29 exons is found in all mammalian cells, and 
the same mRNA is transcribed in all tissues. Part of this 
messenger RNA sequence includes codon number 2153 
that has the sequence CAA and is translated as glutamine 
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RNA editing changes C to U in codon 
2153, creating a new stop codon. 

Translation stops after synthesizing a 
protein containing 2152 amino acids. 


in liver apolipoprotein B, a protein consisting of 4563 
amino acids. In intestinal cells, however, RNA editing 
changes the cytosine in codon 2153 to a uracil, converting 
the codon to UAA. This C-to-U change produced by RNA 
editing creates a “stop” codon that halts translation after 
the assembly of the first 2152 amino acids of intestinal 
apolipoprotein B. 


Sexy Splicing: Alternative mRNA Splicing and Sex Determination in Drosophila 


The number of X chromosomes in the nuclei of Drosophila 
embryos is critical in sex determination, but the X/autosome 
(X/A) ratio proposed by Calvin Bridges (X/A = 1.0 in females 
and X/A = 0.5 in males) as the underlying cause is not the en- 
tire story (see Section 3.4). In fact, the process involves differ- 
ential gene expression and pre-mRNA splicing. The molecular 


basis of Drosophila sex determination depends on a series of 


steps that begins with the transcription activation of the sex- 
lethal (Sxl) gene, includes alternative splicing of the pre-mRNA 
transcript of the transformer (Tra) gene, and culminates with 
one of two alternative splicing variants of the pre-mRNA tran- 
scripts of the double-sex (Dsx) gene. The Dsx protein directs 
further transcription activation and repression, leading to 
female or to male development. 


The X/A ratio in fly embryos initially influences the tran- 
scription and translation of two X-linked activator proteins 
called SisA and SisB, and an autosomal gene producing a 
transcription repressor protein called Deadpan (Figure 8.31). 
Since the genes producing SisA and SisB are X-linked, early fe- 
male embryos produce twice as much of each activator as do 
early male embryos, and the ratio of SisA + SisB to Deadpan 
differs between female and male embryos. In early female 
embryos, the ratio of SisA + SisB protein to Deadpan protein 
leads to transcription of the Sx/ gene and to the production of 
Sxl protein. Sxl transcription is repressed in male embryos and 
no Sxl protein is produced. 

Sxl protein is a splicing regulator that operates on the 
pre-mRNA transcript of the Tra gene. In female embryos, 
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Figure 8.31 The X/A ratio determines gene transcription and transcript splicing pattern 
to determine sex in fruit flies. 


Tra pre-mRNA is spliced to produce a functional Tra pro- 
tein. In male embryos, the absence of Sxl protein leads to 
alternative Tra pre-mRNA splicing that does not produce 
functional Tra protein. The Tra protein is also a splicing reg- 
ulator; it operates on the pre-mRNA of Dsx along with a sec- 
ond protein known as Tra-2. In female embryos, Tra protein 
and Tra-2 protein splice Dsx pre-mRNA in one alternative 
variant, which when translated produces female-specific 
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8.1 RNA Transcripts Carry the Messages of Genes 


RNA molecules are synthesized by RNA polymerases using 
as building blocks the RNA nucleotides A,G,C, and U to 
form single-stranded sequences complementary to DNA 
template strands. 


Messenger RNA is the transcript that undergoes translation 
to produce proteins. Five other major forms of functional 
RNA are transcribed, and may undergo modification, but are 
not translated. 


8.2 Bacterial Transcription Is a Four-Stage Process 


Transcription has four stages: promoter recognition, chain 
initiation, chain elongation, and chain termination. 

A single RNA polymerase transcribes all bacterial genes. 
This polymerase is a holoenzyme composed of a five- 
subunit core enzyme and a sigma subunit that aids the 
recognition of different forms of bacterial promoters. 
Bacterial promoters have two consensus sequence regions 


located upstream of the transcription start at approximately 
-10 and -35. 


Dsx protein. Female-specific Dsx activates transcription of 
female-specific genes and represses transcription of male- 
specific genes to produce female flies. Tra protein is ab- 
sent in male embryos, and Dsx pre-mRNA is spliced in 
another alternative variant. Dsx protein in male embryos 
represses female-specific genes and allows transcription 
of unrepressed male-specific genes, leading to male sex 
development. 


For activities, animations, and review quizzes, go to the Study Area. 


E The core enzyme of bacterial RNA polymerase carries 
out RNA synthesis following chain initiation by the 
holoenzyme. 

| Transcription of most bacterial genes terminates 
by an intrinsic mechanism that depends only on DNA 
terminator sequences. Certain bacterial genes have 
a rho-dependent mechanism of transcription 
termination. 


8.3 Archaeal and Eukaryotic Transcription 
Displays Structural Homology and 
Common Ancestry 


f Eukaryotic cells contain three types of RNA polymerases 
that transcribe mRNA and the various classes of 
functional RNA. 

E RNA polymerase II transcribes mRNA by interaction with 
numerous transcription factors that lead the enzyme to rec- 
ognize promoters controlling transcription of polypeptide- 
coding genes. 


Promoters recognized by RNA polymerase II have a TATA 
box and additional regulatory elements that bind transcrip- 
tion factors and RNA pol II during transcription initiation. 
Transcription shows similarities among all three domains of 
life due to the sharing of a common ancestor and the essen- 
tial nature of transcription. 

Archaeal transcription is a simplified version of 
eukaryotic transcription and is dissimilar from bacterial 
transcription. 

Three archaeal transcription proteins, TBP, TFB, and less 
often TFIIE a, share homology with bacterial and eukaryotic 
proteins and initiate transcription by interacting with RNA 
polymerase. 

Eukaryotic promoter regulatory elements are recognized by 
their consensus sequences. 

Tissue-specific and developmental modifications in 
transcription are regulated by enhancer and silencer 
sequences. 

RNA polymerase I uses exclusive transcription factors to 
recognize upstream consensus sequences of ribosomal RNA 
genes. 

RNA polymerase III recognizes promoter consensus 
sequences that are upstream and downstream of the start of 
transcription. 
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8.4 Post-Transcriptional Processing Modifies 
RNA Molecules 


5’ capping of eukaryotic messenger RNA adds a methylated 
guanine through the action of guanylyl transferase shortly 
after transcription is initiated. 

Polyadenylation at the 3’ end of eukaryotic messenger RNA 
is signaled by an AAUAAA sequence and is accomplished by a 
complex of enzymes. 

Intron splicing is controlled by cellular proteins that identify 
introns and exons and form spliceosome complexes that 
remove introns and ligate exons. 

Consensus sequences at the 5’ splice site, the 3’ splice site, 
and the branch point serve as guides during intron splicing. 
Alternative splicing is regulated by cell-type-specific varia- 
tion of proteins that identify introns and exons. 

Some RNA molecules have catalytic activity and are able to 
self-splice introns without the aid of proteins. 

Ribosomal and transfer RNA molecules are generated by 
cleavage of large precursor molecules transcribed in bacte- 
rial, archaeal, and eukaryotic genomes. 

RNA editing is a post-transcriptional altering of nucleotide 
sequence, causing the transcripts to differ from the corre- 
sponding template DNA sequence. 


KEYWORDS 


3’ polyadenylation (3’ poly-A tailing) 
(p. 285) 

3’ splice site (p. 289) 

5’ capping (p. 285) 

5' splice site (p. 288) 

-35 consensus sequence (p. 273) 

alternative pre-mRNA processing 
(alternative intron splicing, promoter, 
polyadenylation) (pp. 292, 293) 

branch point adenine (p. 289) 

CAAT box (p. 280) 

closed promoter complex (p. 273) 

coding region (p. 272) 

coding strand (nontemplate strand) (p. 271) 

consensus sequence (p. 273) 

core element (p. 283) 

downstream (p. 272) 

enhancer sequence (p. 282) 

exonic and intronic splicing enhancers 
(ESEs and ISE) (p. 294) 

exonic and intronic splicing silencers 
(ESS and ISS) (p. 294) 

functional RNA (tRNA, rRNA, snRNA, 
miRNA, siRNA, ribozyme) (p. 271) 

GC - rich box (p. 280) 

general transcription factors (GTFs) (p. 282) 

guide RNA (gRNA) (p. 298) 

initial committed complex (p. 281) 


initiation complex (p. 282) 

internal control region (ICR) (p. 284) 

internal promoter element (p. 284) 

intrinsic termination (p. 277) 

intron self-splicing (p. 294) 

intron splicing (p. 285) 

intronic splicing enhancer, suppressor 
(ISE, ISS) (p. 294) 

inverted repeat (p. 277) 

lariat intron structure (p. 289) 

mature mRNA (p. 285) 

messenger RNA (mRNA) (p. 270) 

micro RNA (miRNA) (p. 271) 

minimal initiation complex (p. 282) 

nucleolus (nucleoli) (p. 283) 

open promoter complex (p. 273) 

polyadenylation signal sequence (p. 286) 

precursor mRNA (pre-mRNA) (p. 285) 

preinitiation complex (PIC) (p. 282) 

Pribnow box (-10 consensus sequence) 
(p. 273) 

promoter (p. 271) 

promoter-specific element (PSE) (p. 284) 

rho-dependent termination (rho protein) 
(p. 277) 

rho utilization site (rut site) (p. 277) 

ribonucleotide (A,U,G,C) (p. 269) 

ribose (p. 269) 


ribosomal RNA (rRNA) (p. 271) 

ribozymes (p. 271) 

RNA editing (p. 298) 

RNA polymerase core (p. 272) 

RNA polymerase (p. 269) 

RNA polymerase I, II, III (RNA pol I, 
IL, II) (p. 278) 

sigma (o) subunit (alternative sigma 
subunit) (p. 272) 

silencer sequence (p. 283) 

small interfering RNA (siRNA) (p. 271) 

small nuclear RNA (snRNA) (p. 271) 

spliceosome (p. 289) 

stem-loop (hairpin structure) (p. 277) 

TATA box (Goldberg-Hogness box) 
(p. 280) 

TATA-binding protein (TBP) (p. 281) 

TBP-associated factor (TAF) (p. 281) 

template strand (p. 271) 

termination region (p. 272) 

transcription factors (TF) (p. 281) 

transcription-terminating factor I (TTFI) 
(p. 285) 

transfer RNA (tRNA) (p. 271) 

upstream (p. 271) 

upstream control element (p. 283) 

uracil (U) (p. 269) 


302 CHAPTER 8 Molecular Biology of Transcription and RNA Processing 


PROBLEMS 0 MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 
Chapter Concepts For answers to selected even-numbered problems, see Appendix: Answers. 
1. Based on discussion in this chapter, in the sequence (highlighted in blue) is the +1 nucleotide 


2. Inone to two sentences each, describe the three processes 

that commonly modify eukaryotic pre-mRNA. from these sequences. 

3. Answer these questions concerning promoters. Gene 1 TTCCGGCTCGTATGTTGTGTGG A 

a. What role do promoters play in transcription? Gene 2 CGTCATTTGATATGATGCGCCCCG 

b. What is the common structure of a bacterial promoter Genes CCACTCCCCCTCATACTCAGCACIA 
with respect to consensus sequences? 

c. What consensus sequences are detected in the mamma- Gene 4 TTTATTGCAGCTTATAATGGTTAC A 
lian B-globin gene promoter? Gene 5 TGCTTCTGACTATAATAGACAGG G 

d. Eukaryotic promoters are more variable than bacterial Genet AAGTAAAGACGCTACGATGTACCACEE 
promoters. Explain why. 

e. What is the meaning of the techn alternative promoter? 8. Bacterial and eukaryotic gene transcripts can differ, in 
How does the use of alternative promoters affect the transcripts themselves, in whether the transcripts 
transcription? are modified before translation, and in how the tran- 

4. The diagram below shows a DNA duplex. The template scripts are modified. For each of these three areas of 
strand is identified, as is the location of the +1 nucleotide. contrast, describe what the differences are and why the 
differences exist. 
FL 
; ; 9. Describe the two types of transcription termination found 

? ! 3’ template strand in bacterial genes. How does transcription termination 

3' 5' coding strand differ for eukaryotic genes? 

= Sony this TERIEN contains a gene transcribed in a bac- 10. What is the role of enhancer sequences in transcription of 
terium. Identify the location of promoter nee eukaryotic genes? Speculate about why enhancers are not 
quences and of the transcription termination sequence. part of transcription of bacterial genes. 

b. Assume this region contains a gene transcribed to form 
mRNA in a eukaryote. Identify the location of the most 11. Describe the difference between intron sequences and 
common promoter consensus sequences. spacer sequences, such as the spacer sequence depicted in 

c. Ifthis region is a eukaryotic gene transcribed by RNA Figure 8.27b. 
polymerase IL, where are the promoter consensus 12. Draw a bacterial promoter and label its consensus se- 
sequences löcated? quences. How does this promoter differ from a eukaryotic 

5. The following isa portion ofan mRNA sequence: promoter transcribed by RNA polymerase Il? By RNA 
polymerase I? By RNA polymerase III? 
3'- AUCGUCAUGCAGA- 5’ 
13. How do SR proteins help guide pre-mRNA intron 

a. During transcription, was the adenine at the left-hand splicing? What is meant by the term alternative splicing, 
side of the sequence the first or the last nucleotide and how does variation in SR protein production play 
added to the portion of mRNA shown? Explain how a role? 
you know. 

b. Write out the sequence and polarity of the DNA duplex 14, Three genes identified in the diagram as A, B, and C are 
that encodes this mRNA segment. Label the template transcribed from a region of DNA. The 5'-to-3' transcrip- 
and coding DNA strands. tion of genes A and C elongates mRNA in the right-to-left 

c. Identify the direction in which the promoter region for direction, and transcription of gene B elongates mRNA in 
this gene will be located. the left-to-right direction. For each gene, identify the cod- 

6. Compare and contrast the properties of DNA polymerase i prams bY a egon pines an a or lope 
toes ig Ses strand in the diagram. 

and RNA polymerase, listing at least three similarities and 

at least three differences between the molecules. A B c 

7. The DNA sequences shown below are from the promoter 5 | | | | | | 

regions of six bacterial genes. In each case, the last nucleotide 3’ __] | | L | E 5 


a. What is a gene? 
b. Why are genes for rRNA and tRNA considered to be 
genes even though they do not produce polypeptides? 


that initiates transcription. 

a. Examine these sequences and identify the Pribnow box 
sequence at approximately -10 for each promoter. 

b. Determine the consensus sequence for the Pribnow box 


Application and Integration 


15. 


16. 


17. 


18. 


The eukaryotic gene Gen-100 contains four introns labeled 
A to D. Imagine that Gen-100 has been isolated and its 
DNA has been denatured and mixed with polyadenylated 
mRNA from the gene. 


a. Illustrate the R-loop structure that would be seen with 
electron microscopy. 

b. Label the introns. 

c. Are intron regions single stranded or double stranded? 
Why? 


The segment of the bacterial TrpA gene involved in 
intrinsic termination of transcription is shown below. 


3'- TGGGTCGGGGCGGATTACTGCCCCGAAAAAAAACTTG- 5' 
5'-ACCCAGCCCCGCCTAATGACGGGGCTTTTTTTTGAAC - 3’ 


a. Draw the mRNA structure that forms during transcrip- 
tion of this segment of the TrpA gene. 

b. Label the template and coding DNA strands. 

c. Explain how a sequence of this type leads to intrinsic 
termination of transcription. 


A 2-kb fragment of E. coli DNA contains the complete 
sequence of a gene for which transcription is terminated 
by the rho protein. The fragment contains the complete 
promoter sequence as well as the terminator region of the 
gene. The cloned fragment is examined by band shift assay 
(see Research Technique 8.1). Each lane of a single electro- 
phoresis gel contains the 2-kb cloned fragment under the 
following conditions: 


Lane 1: 2-kb fragment alone 
Lane 2: 2-kb fragment plus the core enzyme 


Lane 3: 2-kb fragment plus the RNA polymerase 
holoenzyme 


Lane 4: 2-kb fragment plus rho protein 


a. Diagram the relative positions expected for the DNA 
fragments in this gel retardation analysis. 

b. Explain the relative positions of bands in lanes 1 and 3. 

c. Explain the relative positions of bands in lanes 1 and 4. 


A 3.5-kb segment of DNA containing the complete se- 
quence of a mouse gene is available. The DNA segment 
contains the promoter sequence and extends beyond the 
polyadenylation site of the gene. The DNA is studied by 
band shift assay (see Research Technique 8.1), and the fol- 
lowing gel bands are observed. 


Lane: 1 2 3 4 5 


Match these conditions to a specific lane of the gel. 

a. 3.5-kb fragment plus TFIIB and TFIID 

b. 3.5-kb fragment plus TFIIB, TFIID, TFIIF, and RNA 
polymerase II 
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19. 


20. 


21. 


c. 3.5-kb fragment alone 
d. 3.5-kb fragment plus RNA polymerase II 
e. 3.5-kb fragment plus TFIIB 


A 1.0-kb DNA fragment from the 5’ end of the mouse gene 
described in the previous problem is examined by DNA 
footprint protection analysis (see Research Technique 8.1). 
Two samples are end-labeled with ??P, and one of the two 
is mixed with TFIIB, TFIID, and RNA polymerase II. The 
DNA exposed to these proteins is run in the right-hand 
lane of the gel shown below and the control DNA is run in 
the left-hand. Both DNA samples are treated with DNase I 
before running the samples on the electrophoresis gel. 
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a. What length of DNA is bound by the transcriptional 
proteins? Explain how the gel results support this 
interpretation. 

b. Draw a diagram of this DNA fragment bound by the 
transcriptional proteins, showing the approximate posi- 
tion of proteins along the fragment. Use the illustration 
style seen in Research Technique 8.1 as a model. 

c. Explain the role of DNase I. 


Wild-type E. coli grow best at 37°C but can grow efficiently 
up to 42°C. An E. coli strain has a mutation of the sigma 
subunit that results in an RNA polymerase holoenzyme 
that is stable and transcribes at wild-type levels at 37°C. 
The mutant holoenzyme is progressively destabilized as 
the temperature is raised, and it completely denatures and 
ceases to carry out transcription at 42°C. Relative to wild- 
type growth, characterize the ability of the mutant strain to 
carry out transcription at 
ae 37°C b. 40°C c. 42°C 
d. What term best characterizes the type of mutation 
exhibited by the mutant bacterial strain? (Hint: The 
term was used in Chapter 4 to describe the Himalayan 
allele of the mammalian C gene.) 


A mutant strain of Salmonella bacteria carries a mutation 
of the rho protein that has full activity at 37°C but is com- 
pletely inactivated when the mutant strain is grown at 40°C. 
a. Speculate about the kind of differences you would expect 
to see if you compared a broad spectrum of mRNAs from 
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the mutant strain grown at 37°C and the same spectrum 
of mRNAs from the strain when grown at 40°C. 

b. Are all mRNAs affected by the rho protein mutation in 
the same way? Why or why not? 


The human -globin wild-type allele and a certain mutant 
allele are identical in sequence except for a single base-pair 
substitution that changes one nucleotide at the end of in- 
tron 2. The wild-type and mutant sequences of the affected 
portion of pre-mRNA are 


Intron 2 Exon 3 
wild type 5’-CCUCCCACAG cUCcCcUG-3’ 
mutant 5’-CCUCCCACUG CUCCUG-3’ 


a. Speculate about the way in which this base substitution 
causes mutation of B-globin protein. 

b. This is one example of how DNA sequence change oc- 
curring somewhere other than in an exon can produce 
mutation. List other kinds of DNA sequence changes 
occurring outside exons that can produce mutation. In 
each case, characterize the kind of change you would 
expect to see in mutant mRNA or mutant protein. 


Microbiologists describe the processes of transcription and 
translation as “coupled” in bacteria. This term indicates 
that a bacterial mRNA can be undergoing transcription at 
the same moment it is also undergoing translation. 


a. How is coupling of transcription and translation 
possible in bacteria? 

b. Is coupling of transcription and translation possible in 
single-celled eukaryotes such as yeast? Why or why not? 


A full-length eukaryotic gene is inserted into a bacterial 

chromosome. The gene contains a complete promoter se- 

quence and a functional polyadenylation sequence, and it 

has wild-type nucleotides throughout the transcribed 

region. However, the gene fails to produce a functional 

protein. 

a. List at least three possible reasons why this eukaryotic 
gene is not expressed in bacteria. 

b. What changes would you recommend to permit expres- 
sion of this eukaryotic gene in a bacterial cell? 


The accompanying illustration shows a portion of a gene 

undergoing transcription. The template and coding strands 

for the gene are labeled, and a segment of DNA sequence is 

given. For this gene segment: 

a. Superimpose a drawing of RNA polymerase as it nears 
the end of transcription of the DNA sequence. 

b. Indicate the direction in which RNA polymerase moves 
as it transcribes this gene. 

c. Write the polarity and sequence of the RNA transcript 
from the DNA sequence given. 

d. Identify the direction in which the promoter for this 
gene is located. 
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DNA footprint protection (described in Research Technique 
8.1) is a method that determines whether proteins bind to 

a specific sample of DNA and thus protect part of the DNA 
from random enzymatic cleavage by DNase I. A 400-bp seg- 
ment of cloned DNA is thought to contain a promoter. The 
cloned DNA is analyzed by DNA footprinting to determine 
if it has the capacity to act as a promoter sequence. The gel 
shown below has two lanes, each containing the cloned 400- 
bp DNA fragment treated with DNase I to randomly cleave 
unprotected DNA. Lane 1 is cloned DNA that was mixed 
with RNA polymerase II and several TFII transcription fac- 
tors before exposure to DNase I. Lane 2 contains cloned 
DNA that was exposed only to DNase I. RNA pol II and 
TFIIs were not mixed with DNA before adding DNase I. 
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a. Explain why this gel provides evidence that the cloned 
DNA may act as a promoter sequence. 

b. Approximately what length is the DNA region pro- 
tected by RNA pol II and TFIls? 

c. What additional genetic experiments would you sug- 
gest to verify that this region of cloned DNA contains a 
functional promoter? 


Suppose you have a 1-kb segment of cloned DNA that is 
suspected to contain a eukaryotic promoter including a 
TATA box, a CAAT box, and an upstream GC-rich se- 
quence. The clone also contains a gene whose transcript 

is readily detectable. Your laboratory supervisor asks you 
to outline an experiment that will (1) determine if eukary- 
otic transcription factors (TF) bind to the fragment and, 

if so, (2) identify where on the fragment the transcription 
factors bind. All necessary reagents, equipment, and ex- 
perimental know-how are available in the laboratory. Your 
assignment is to propose techniques to be used to address 
the three items your supervisor has listed and to describe 
the kind of results that would indicate binding of TF to the 
DNA, the location of the binding. (Hint: The techniques 
and general results are discussed in this chapter.) 


The Molecular Biology 
of Translation 


Ribosomes use codon sequences of messenger RNA to direct the assembly 
of polypeptides during translation. This rendering of a ribosome engaged 
in translation is based on recent crystal structure analysis and accurately 
shows the large subunit (top) and small subunit (bottom), the track of 
mRNA through the small subunit, the spaces for E, P, and A sites into which 
tRNAs fit, and the egress of the polypeptide through the large subunit. 


ong before the discovery that DNA is the hereditary 

molecule, biologists had established the relationship 
between genes and proteins. In 1902, Archibald Garrod was 
the first to explicitly draw this connection when he pro- 
posed that the human hereditary disorder alkaptonuria was 
caused by an inherited defect in the enzyme homogentisic 
acid oxidase (see Section 4.3 and Figure 4.17b). As Garrod 
and other biologists expanded their exploration of the gene- 
protein connection, they found evidence that hereditary 
variation was closely tied to variations in proteins. Principal 


CHAPTER OUTLINE 


9.1 Polypeptides Are Composed 
of Amino Acid Chains That Are 
Assembled at Ribosomes 

9.2 Translation Occurs in Three 
Phases 

9.3 Translation Is Fast and Efficient 

} The Genetic Code Translates 

Messenger RNA into Polypeptide 

9.5 Experiments Deciphered the 
Genetic Code 

9.6 Translation Is Followed by 

Polypeptide Folding, Processing, 

and Protein Sorting 


ESSENTIAL IDEAS 


Translation is the cellular process of polypeptide 
production carried out by ribosomes under the 
direction of mRNA. 


Ribosomes assemble on mRNA and initiate 
translation at the start codon. 

Polypeptide elongation and termination are 
similar in bacteria and eukaryotes. 

Transfer RNA molecules carry amino acids to 
ribosomes, which assemble polypeptides with 
the aid of ribosomal proteins. 

A virtually universal genetic code comprising 
64 mRNA codons directs polypeptide assembly. 


Polypeptides undergo posttranslational folding 
and processing, and in eukaryotes are sorted into 
vesicles for transport to cellular destinations or 
for secretion. 
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among the biologists who developed this connec- 
tion were George Beadle and Edward Tatum, whose 
research established the “one gene-one enzyme” 
hypothesis (Chapter 5). 

This chapter discusses translation, the 
mechanism by which the messenger RNA (mRNA) 
transcripts of genes are used to assemble amino 
acids into polypeptide strings that form proteins. 
Translation is carried out by ribosomes that bring 
together mRNA transcripts and transfer RNA (tRNA) 
molecules that carry amino acids and facilitate the 
assembly of polypeptides, strings of amino acids. 

Polypeptides form the enzymes (catalytic pro- 
teins), structural proteins, transport proteins, signal- 
ing proteins, hormones, and other components that 
are assembled into cell structures and that perform 
biological activities in cells. Your body is composed 
of trillions of cells that collectively express and utilize 
tens of thousands of different polypeptides, all syn- 
thesized by translation. 

The story of how polypeptides are produced 
by translation, and the story of how scientists 
came to understand the process, offers intrigu- 
ing insight into the design of molecular genetic 
experiments. In this chapter, we describe some 
of these experiments and examine the molecular 
biology of translation. We look at the homology of 
proteins that are active in translation in organisms 
from the three domains of life and describe how 
this and other features of translation are evidence 
of a single origin of life and of the evolutionary 
relationships between bacteria, archaea, and eu- 
karyotes. In the final chapter section, we discuss 
posttranslational processes that are instrumen- 
tal in producing functional proteins and guiding 
them to their appropriate destinations in cells. The 
chapter concludes with a case study describing the 
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Figure 9.1 


action of commonly used antibiotics that interfere 
with bacterial translation. 


9.1 Polypeptides Are Composed 
of Amino Acid Chains That Are 
Assembled at Ribosomes 


Twenty different amino acids are the basic building blocks 
of polypeptides. All amino acids have features in common 
and features that are distinct. The distinctive features impart 
specific characteristics that allow the amino acid to partici- 
pate in certain chemical reactions or behave in a hydrophilic 
or hydrophobic manner. In part, the common features allow 
amino acids to be joined into polypeptides by covalent bond 
formation between adjacent amino acids in the chain. 


Amino Acid Structure 


The shared features of amino acids are a central carbon 
molecule known as the a-carbon, an amino (NH3) group, 
and a carboxyl (COOH) group (Figure 9.1). Each amino 
and carboxyl group is joined to the a-carbon. During poly- 
peptide assembly, an enzyme in the ribosome catalyzes the 
formation of a peptide bond between the carboxyl group 
of one amino acid and the amino group of the next amino 
acid in the chain. Each amino acid added in this way be- 
comes a new monomer in the growing polymer that is the 
elongating polypeptide. The term polypeptide identifies 
a string of amino acids that are joined by peptide bonds. 
Each protein has a unique sequence of amino acids, may be 
composed of one or more polypeptide chains, and gener- 
ally have a characteristic three-dimensional structure. 

The distinctive portion of each amino acid is its 
side chain, known as an R-group, that is joined to the 
a-carbon. The R-groups range in complexity from a single 
hydrogen atom to ringed structures that in themselves 
contain multiple carbon atoms. Each R-group imparts 
specific characteristics as shown in Table 9.1. Ten of the 
amino acids have nonpolar R-groups, meaning that they 
have no charged atoms that can participate in formation 
of hydrogen bonds with other amino acids. Five other 
amino acids have polar R-groups that can carry partial 
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Peptide bond formation. The carboxyl group of one amino acid reacts with the amino 


group of a second amino acid to form a covalent peptide bond that joins amino acids in a polypeptide. 
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Table 9.1 Amino Acids Grouped by Their Side 


Chain Properties 


Nonpolar side chains: Have no charged or electronegative 
atoms at pH 7.0 to form hydrogen bonds. 


Alanine (Ala or A) Methionine (Met or M) 
Cysteine (Cys or C) Phenylalanine (Phe or F) 
Glycine (Gly or G) Proline (Pro or P) 


Isoleucine (Ile or I) Tryptophan (Trp or W) 
Valine (Val or V) 


Polar side chains: Have partial charges at pH 7.0 and can 
form hydrogen bonds. 


Asparagine (Asp or N) 
Glutamine (Glu or Q) 
Serine (Ser or S) 


Leucine (Leu or L) 


Threonine (Thr or T) 
Tyrosine (Tyr or Y) 


Electrically charged side chains: At pH 7.0, can form 
hydrogen and ionic bonds. 


Basic Side Chains 
Arginine (Arg or R) 
Histidine (His or H) 
Lysine (Lys or K) 


Acidic Side Chains 
Aspartate (Asp or D) 
Glutamate (Glu or E) 


charges and can participate in hydrogen bond formation 
with other amino acids. The five remaining amino acids 
have electrically charged R-groups: Three are basic and 
two are acidic. Electrically charged R-groups allow these 
amino acids to form ionic bonds and hydrogen bonds. 


Polypeptide and Transcript Structure 


Polypeptide assembly is orchestrated by ribosomes, which 
are ribonucleoprotein “machines” containing multiple 
molecules of ribosomal RNA (rRNA) and dozens of pro- 
teins. Ribosomes of all organisms are composed of two 
subunits that assemble into a ribosome as translation 
begins. Ribosomes bind mRNA and provide an environ- 
ment for complementary base pairing between mRNA 
codon sequences and the anticodon sequences of tRNA. 
(In Chapter 1 and Figure 1.11, we review these basic me- 
chanical features of translation.) Figure 9.2 encapsulates 
the essential elements of translation. Ribosomes translate 
mRNA in the 5’ — to 3’ direction, beginning with the 
start codon and ending with a stop codon. At each trip- 
let codon, complementary base pairing between mRNA 
and tRNA determines which amino acid is added to the 
nascent (growing) polypeptide. The start codon and stop 
codon define the boundaries of the translated segment of 
mRNA. The resulting polypeptides have an N-terminal 
(amino-terminal) end corresponding to the 5’ end of 
mRNA and a C-terminal (carboxyl-terminal) end that 
corresponds to the 3’ end of mRNA (Figure 9.3). 

Figure 9.3 identifies two segments of the mRNA tran- 
script that do not undergo translation. Between the 5’ end 
of mRNA and the start codon is a segment known as the 
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Figure 9.2 Translation overview. 


5' untranslated region, abbreviated 5' UTR. The region 
between the stop codon and the 3’ end of the molecule is the 
3’ untranslated region, or 3’ UTR. The 5’ UTR contains 
sequences that help initiate translation and the 3’ UTR con- 
tains sequences associated with transcription termination in 
almost all bacterial and eukaryotic mRNAs. By comparison, 
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Figure 9.3 Alignment of DNA, mRNA, and polypeptide. 
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relatively little is known about the roles of archaeal 5’ and 3’ 
UTRs. Many archaeal mRNAs have a 5’ UTR that functions 
similarly to those of bacteria and eukaryotes. However, a 
substantial proportion of archaeal mRNAs—some studies 
suggest 50% or more of them—do not have a 5’ UTR. These 
so-called “leaderless” mRNAs are still efficiently translated, 
but the details of the mechanism remain unclear. It has been 
proposed that archaeal leaderless mRNAs could perhaps be 
a relic of an ancestral mode of translation. 

Polypeptides have four levels of organization that 
each describe an aspect of their underlying structure 
(Table 9.2). The polypeptide primary structure is the 
sequence of amino acids contained in the polypeptide. 
The order of amino acids and the length of a polypeptide 
(the number of amino acids it contains) are effectively 
limitless. There are billions of possible amino acid se- 
quence options even among short polypeptides of 20 
amino acids or less. The specific order of amino acids is, 
however, critical to the proper function of a polypeptide. 
The R-groups of amino acids affect the solubility and re- 
activity of amino acids, and therefore they affect the func- 
tional properties of the polypeptide. 

Polypeptide secondary structure is generated by hy- 
drogen bonds that form between amino acids. Hydrogen 
bond formation requires that amino acids with polar 
R-groups align with one another. This is accomplished by 
bending or twisting the polypeptide in one of two possible 
structures. An a-helix (alpha helix) is a twisted coil of 
amino acids stabilized by hydrogen bonds between par- 
tially charged R-groups. A B-pleated sheet (beta-pleated 
sheet) is a 180-degree bend created when a segment of a 


polypeptide folds. The primary structure is critical to de- 
termining which, if either, of these secondary structures 
forms in a polypeptide. 

A polypeptide’s tertiary structure is the result of a vari- 
ety of interactions involving the R-groups. Interactions such 
as hydrogen bonding, covalent bonding, ionic interactions, 
and hydrophobic interactions produce the overall shape of 
the protein. Tertiary structure is dependent on primary and 
secondary structure, and it should come as no surprise that 
protein shapes vary widely. These shapes form the binding, 
interaction, and catalytic domains that are responsible for 
the protein’s action in the body. The tertiary structure of 
a protein may change in response to the presence of other 
chemical substances, including other protein molecules. For 
example, an enzyme may have a catalytically active tertiary 
structure under some circumstances and have an alterna- 
tive, nonactive tertiary structure under others. 

Primary, secondary, and tertiary structures describe 
different levels of organization of individual polypeptides. 
But some proteins contain two or more polypeptides, an 
organization described as quaternary structure. Proteins 
that have a quaternary structure contain distinct polypep- 
tides that each have their own primary, secondary, and 
tertiary structures. Such proteins are often described as 
multimers. The individual polypeptides of a multimer may 
be identical or may be different. For example, a protein 
composed of four identical polypeptides can be called a 
homotetramer, and a four-polypeptide protein that con- 
tains two or more different polypeptides can be identified 
as a heterotetramer. Table 9.2 summarizes these four levels 
of polypeptide structure and illustrates the red blood cell 


Table 9.2 Polypeptide Structure 


Level Description Stabilized by Example: Hemoglobin 
Primary The sequence of amino Peptide bonds HaHa 
acids in a polypeptide i 
Secondary Formation of a-helices and Hydrogen bonding between Onec neix 
B-pleated sheets in a poly- groups along the peptide- 
peptide (thus, depends on bonded backbone. 
primary structures) 
One of 
hemoglobin’s 
subunits 
Tertiary Overall three-dimensional Bonds and other interac- 
shape of a polypeptide tions between R-groups, or 
(includes contribution from between R-groups and the 
secondary structures) peptide-bonded backbone. 
Hemoglobin 
consists 
pa aes ee ee of four 
Quaternary Shape produced by Bonds and other interactions al e 


combinations of polypep- 
tides (each with its own 
tertiary structure) 


between R-groups, and be- 
tween peptide backbones of 
different polypeptides. 
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protein hemoglobin—a heterotetramer—as an example of 
a protein with a quaternary structure. Hemoglobin and a 
specific variant of one of the polypeptides in this heterotet- 
ramer are the focus of discussion in Chapter 10. 


Ribosome Structures 


The specific molecules composing bacterial, archaeal, and 
eukaryotic ribosomes differ, but the overall structures and 
functions of the ribosomes are similar, reflecting the funda- 
mental nature of the translation process in all forms of life. In 
all three domains, ribosomes perform three essential tasks: 


1. Bind messenger RNA and identify the start codon 
where translation begins. 


2. Facilitate the complementary base pairing of mRNA 
codons and tRNA anticodons that determines amino 
acid order in the polypeptide. 


3. Catalyze peptide bond formation between amino ac- 
ids during polypeptide formation. 


Differences in ribosomal composition between bac- 
teria, archaea, and eukaryotes include the number and 
sequence of rRNA molecules and the number and type of 
ribosomal proteins. Although the archaeal and bacterial 
ribosomes are similar in size, and somewhat smaller than 
the eukaryotic ribosomes, most of the archaeal ribosomal 
proteins (and the tRNAs and protein factors involved in 
translation) display homology to their eukaryotic counter- 
parts. In all three domains, ribosomes display key structural 
similarities that are divided into two main subunits, called 
the large ribosomal subunit and the small ribosomal sub- 
unit. By convention, subunit size is measured in Svedberg 
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units (S), which describe the velocity of their sedimentation 
when subjected to a centrifugal force. Named in honor of 
Theodor Svedberg, a 1926 Nobel Laureate in Chemistry 
and inventor of the ultracentrifuge, higher S values indicate 
faster sedimentation rates and larger molecules. It should be 
noted that Svedberg units are not additive when ribosomal 
subunits are combined because sedimentation is a compos- 
ite property that is affected by multiple molecular factors, 
including size, shape, and hydration state. 

The ribosomes of E. coli are the most thoroughly 
studied bacterial ribosomes and serve as a model for gen- 
eral ribosome structure (Figure 9.4a). The small subunit 
of bacterial ribosomes has a Svedberg value of 30S. It 
contains 21 proteins and a single 16S rRNA composed of 
1541 nucleotides. The large subunit of the bacterial ribo- 
some is a 50S particle composed of 32 proteins, a small 5S 
rRNA containing 120 nucleotides, and a large 23S rRNA 
containing 2904 nucleotides. When fully assembled, the 
intact bacterial ribosome has a Svedberg value of 70S. 

Both the large and small subunits contribute to the 
formation of three regions that play important functional 
roles during translation: the peptidyl site, or P site, the 
aminoacyl site, or A site and, the exit site, or E site. The 
P site holds a tRNA to which the nascent polypeptide is 
attached. The A site binds a new tRNA molecule carry- 
ing the next amino acid to be added to the polypeptide. 
The E site provides an avenue of egress for tRNAs as they 
leave the ribosome after their amino acid has been added 
to the polypeptide chain. Ribosomes also form a channel 
through which the polypeptide emerges. In addition, there 
is a channel in the large subunit through which the nascent 
polypeptide is extruded from the ribosome (see Figure 9.2). 
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Figure 9.4 Ribosomes of bacteria, archaea, and eukaryotes. (a) The best-studied bacteria 
ribosome is that of E. coli, and the best-described archaeal ribosome is that of Haloarcula marismortui. 


(b) The best-studied eukaryotic ribosomes are mammalian. 
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Among eukaryotes, mammalian ribosomes are the 
most fully characterized (Figure 9.4b). The small 40S ri- 
bosomal subunit contains approximately 35 proteins and 
a single 18S rRNA composed of 1874 nucleotides. The 
large mammalian ribosomal subunit has a Svedberg value 
of 60S and contains 45 to 50 proteins, along with three 
molecules of rRNA. The rRNA molecules have values 
of 5S (120 nucleotides), 5.8S (160 nucleotides), and 28S 
(4718 nucleotides). The intact mammalian ribosome has 
a Svedberg value of 80S. Like the bacterial ribosome, the 
intact mammalian ribosome possesses a P site, an A site, 
an E site, and a channel for polypeptide egress. 

The ribosomes of archaeal species have not been 
studied nearly as fully as those of bacterial and eukary- 
otes, but some information is available. The first atomic 
crystal structure of the large ribosomal subunit of an 


archaeon was that of Haloarcula marismortui. This 
structure included a 23S and a 5S rRNA and 27 pro- 
teins. Follow-up analysis of the small subunit structure 
revealed a 16S rRNA and 19 proteins. This is the basis 
for the conclusion that archaeal ribosomes have an 
overall size and structure similar to that of the 70S bac- 
terial ribosome. As we discuss later, however, archaeal 
tRNAs and translation proteins are similar to those in 
eukaryotes. 

The proteins contained in ribosomal subunits can 
be separated from one another by a specialized type of 
electrophoresis called two-dimensional gel electrophore- 
sis. The 21 proteins that are part of the small ribosomal 
subunit in E. coli and the 31 proteins found in the large ri- 
bosomal subunit are efficiently separated by this method. 
Research Technique 9.1 describes how two-dimensional 


Research Technique 9.1 


Two-Dimensional Gel Electrophoresis and 
the Identification of Ribosomal Proteins 


PURPOSE All ribosomes are composed of two subunits that 
are each a complex mixture of rRNA and dozens of proteins. 
One approach to determining the number of proteins con- 
tained in each ribosomal subunit uses a method of electro- 
phoresis known as two-dimensional gel electrophoresis to 
separate the proteins by their charge in the first dimension and 
then by their mass in the second dimension. Two-dimensional 
gel electrophoresis produces a distinctive “protein fingerprint” 
that distributes each ribosomal protein to a different location 
in the two-dimensional gel. 


MATERIALS AND PROCEDURES Ribosomes are isolated 
from cells, the subunits are separated, and the subunits are 
treated to dissociate the proteins they contain. The mixture 
containing liberated ribosomal proteins is then separated 
in the first dimension by a version of gel electrophoresis 
known as isoelectric focusing. In this procedure, proteins are 
separated exclusively by their charge. In contrast to conven- 
tional gel electrophoresis, which uses a buffered solution to 
maintain constant pH throughout the gel, isoelectric focus- 
ing gels contain a pH gradient. A protein’s pH environment 
affects its charge, and every protein has a pH—called the 
isoelectric point—at which it has neutral charge and cannot 
move in an electrical field. In isoelectric focusing, proteins 
migrate through the pH gradient to their isoelectric point, 
where they stop. 

Once isoelectric focusing is complete, protein separa- 
tion takes place in the second dimension, which uses SDS 
(sodium dodecyl sulfate) gel electrophoresis. SDS is a strong 
anionic detergent that denatures proteins by disrupting 
the interactions that keep them folded. Denatured proteins 
migrate through the gel at a rate determined by their mass, 
that is, by the number of amino acids they contain. In the 
SDS gel dimension of two-dimensional gel electrophoresis, 
each protein has a unique starting point corresponding to 


First dimension: charge 
Isoelectric focusing 


Gel S 


Ssi W.. 


g 


12 
yt 


516817 Sy, : a -L28 
Ssi PANE eee 


Lar Laa | 
es SOM iat 
7 


Second dimension: mass 
SDS gel electrophoresis 


$o 


its isoelectric point. Proteins with large mass (more amino 
acids) migrate a short distance in the second dimension, 
whereas proteins with small mass (fewer amino acids) mi- 
grate a greater distance. 


DESCRIPTION A pair of two-dimensional electrophoresis 
gels, one containing proteins of the small subunit of the E. coli 
ribosome (gel S) and the other containing proteins of the 
large subunit (gel L), reveal protein spots (the protein finger- 
print) corresponding to the positions of proteins that make up 
each ribosomal subunit. Each spot identifies the location of a 
unique protein that differs from the other proteins in the gel 
by a combination of charge and mass. The proteins in gel S 
are identified as S1 to S21, and in gel L as L1 to L32. 


CONCLUSION Two-dimensional gel electrophoresis identi- 
fies 21 proteins in the small subunit of the E. coli ribosome and 
32 proteins in the large ribosomal subunit. Each protein ob- 
tained by two-dimensional electrophoresis can be subjected 
to additional biochemical examination to specifically identify 
the protein and investigate its role in translation. 


gel electrophoresis is used to characterize the proteins 
found in E. coli ribosomal subunits. 


A Three-Dimensional View of the Ribosome 


Ribosomes are so small—a mere 25 nanometers (nm) in 
diameter—that almost 10,000 of them can fit in the same 
space as the period at the end of this sentence. No one 
has ever “seen” a ribosome, but powerful molecular imag- 
ing techniques can resolve the three-dimensional con- 
figuration of ribosomes and ribosomal subunits, at levels 
of resolution that are measured in angstr6ms (A). These 
structural analyses have clarified how ribosomal subunits 
fit together, and have produced a detailed understanding 
of ribosomal interactions with mRNA and tRNA. 
Structural analysis of ribosomes and other molec- 
ular complexes in cells is made possible by a tech- 
nique known as cryo-electron microscopy (cryo-EM), 
pioneered by Robert Glaeser in the 1970s and perfected 
by Jacques Dubochet in the 1980s. Cryo-EM uses liq- 
uid nitrogen or liquid ethane, with temperatures nearly 
—200°C, to instantaneously freeze macromolecules and 
thus preserve them in their native state. A frozen mac- 
romolecule is then placed on a microcaliper and scanned 
from various angles by electron beams that collect 
data analyzed by specialized software to create a three- 
dimensional picture of molecular structure. Cryo-EM 
creates exquisitely precise three-dimensional images of 
ribosome structure—much like CAT-scan imaging of 
the human body—revealing atomic-level details of ribo- 
some structure (Figure 9.5). These images have identified 
the location and dimensions of the E, A, and P sites, for 
example, and have clarified the mechanical activities of 
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ribosomes during translation. This work was recognized 
with the 2009 Nobel Prize in Chemistry awarded to Ada 
Yonath, Thomas Steitz, and Venki Ramakrishnan. 


9.2 Translation Occurs in Three Phases 


Translation occurs in three phases: initiation, elonga- 
tion, and termination. The three phases are generally 
similar in bacteria, archaea, and eukaryotes, and yet they 
differ in several ways, particularly during translation ini- 
tiation, where distinct mechanisms are used to identify 
the start codon. 


Translation Initiation 


Translation initiation in all organisms begins when the 
small ribosomal subunit binds near the 5’ end of mRNA 
and identifies the start codon sequence. In the next stage, 
the initiator tRNA, the tRNA carrying the first amino acid 
of the polypeptide, binds to the start codon. In the final 
stage of initiation, the large subunit joins the small subunit 
to form an intact ribosome, and translation begins. During 
these stages, initiation factor proteins help control ribosome 
formation and binding of the initiator tRNA, and guano- 
sine triphosphate (GTP) provides energy. The tRNAs used 
during translation each carry a specific amino acid and are 
identified as charged tRNAs. In contrast, a tRNA without 
an amino acid is uncharged. Specialized enzymes discussed 
in a later section are responsible for recognizing different 
tRNAs and charging each one with the correct amino acid. 
Starting translation at the authentic (correct) start co- 
don is essential for translation of the correct polypeptide. 


(b) 


Figure 9.5 Three-dimensional computer interpretations of cryo-EM-generated data depict 


ribosome structure. 
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Errant translation starting at the wrong codon, or even at 
the wrong nucleotide of the start codon, may produce an 
abnormal polypeptide and result in a nonfunctional pro- 
tein. Thus, critical questions for biologists studying transla- 
tion initiation were these: How does the ribosome locate 
the authentic start codon? And if more than one AUG (start 
codon) sequence occurs near the 5’ end of the mRNA, how 
is the authentic start codon identified? Bacteria and eu- 
karyotes use different mechanisms to identify the authentic 
start codon. 


@ Formation of preinitiation complex 


Initiator 
tRNA 


(2) Formation of 30S initiation complex 


© Ribosome assembly 
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Figure 9.6 Initiation of bacterial translation. 


Bacterial Translation Initiation In £. coli, six critical 
molecular components come together to initiate the tran- 
slation process: (1) mRNA, (2) the small ribosomal subunit, 
(3) the large ribosomal subunit, (4) the initiator tRNA, 
(5) three essential initiation factor proteins, and (6) GTP. 

For most of translation initiation in bacteria, the 30S 
ribosomal subunit is affiliated with an initiation factor 
(IF) protein called IF3, which facilitates binding between 
the mRNA and the 30S subunit. IF3 also prevents the 
30S subunit from binding to the 50S subunit (Figure 9.6). 


Polypeptide-coding 


[sequence 
Shine-Dalgarno Start 
sequence codon 


mRNA 5' /]/AGGAGGUUCAGGAUAUG@CGU// 3’ 
16S rRNA 3' J HUCCUCCE [i 5’ 


The small subunit-IF3 complex binds near the 5’end of mRNA at translation 
initiation and searches for the Shine-Dalgarno sequence. The 
Shine-Dalgarno sequence of mRNA base-pairs with the 16S rRNA in the 
small subunit to position the start codon (AUG) at the P site. IF3 temporarily 
prevents attachment of the large subunit. 


Initiator tRNA 


Charged tRNA™*, IF1, and IF2 join in the formation of the initiation complex; 
GTP provides energy. 


Ribosome movement 
along mRNA 


The large subunit joins the initiation complex; IFs dissociate. The next 
charged tRNA enters the A site. 


The small subunit—IF3 complex binds near the 5’ end of 
mRNA, searching for the AUG sequence that serves as 
the start codon. The preinitiation complex forms when 
the authentic start codon sequence is identified by base 
pairing that occurs between the 16S rRNA in the 30S ribo- 
some and a short mRNA sequence located a few nucleo- 
tides upstream of the start codon in the 5’ UTR of mRNA 
(Figure 9.6, @). John Shine and Lynn Dalgarno identified 
the location and sequence of this region in 1974, and it is 
named the Shine-Dalgarno sequence in recognition of 
their work. 

The Shine-Dalgarno sequence is a purine-rich se- 
quence of about six nucleotides located three to nine 
nucleotides upstream of the start codon. A complemen- 
tary pyrimidine-rich segment containing the sequence 
uccucc is found near the 3’ end of 16S rRNA, and it 
pairs with the Shine-Dalgarno sequence to position the 
mRNA on the 30S subunit (see Figure 9.6). The Shine- 
Dalgarno sequence is another example of a consensus 
sequence. Like the consensus sequences we describe for 
promoters (Chapter 8) the Shine-Dalgarno sequence has 
a characteristic nucleotide composition and a precise 
position relative to the start codon, but its exact nucleo- 
tide sequence varies slightly from one mRNA to another 
(Figure 9.7). 

In the next step of translation initiation (Figure 9.6,@), 
the initiator tRNA binds to the start codon at what will 
be part of the P site after ribosome assembly. The amino 
acid on the initiator tRNA is a modified methionine 
called N-formylmethionine (fMet); thus, the charged 
initiator tRNA is abbreviated tRNAM™€t, This tRNA has a 
3'-UAC-5’ anticodon sequence that is a complementary 
mate to the start codon sequence. An initiation factor (IF) 
protein designated IF2 and a molecule of GTP are bound 
at the P site to facilitate binding of tRNA™*, Initiation 
factor 1 (IF1) also joins the complex to forestall attach- 
ment of the 50S subunit. At this point, the 30S initiation 
complex, consisting of mRNA bound to the 30S subunit, 


Shine-Dalgarno Start 
sequence codon 


E. coli araB WUUGGAUGGAGUGAAACGAUGGCGAUUGCA) 3’ 
E. coli laci (CAAUUCAGGGU GGUGAAUAUGAAACCAGUA, 
E. coli lacZ UWCACACAGGAAACAGCUAUGACCAUGAUU 
E.coli thrA (@GUAACCAGGUAACAAGGAUGCGAGUGUUG 
E. coli trpA [AGCACGAG GG GAAAUCUGAUGGAACGCUAC, 
E. coli trpB [AUAUGAAGGAAAGGAACARUGACAACAUUA, 
à phage cro [AUGUACUAAGGAGGUUGUAUGGAACAACGC 
R17 phage A protein [UCCUAGGAG GUUUGACCUAUGCGAGCUUUUT 
OB phage A replicase [UAACUAAG GAUGAAAUGCAUGUCUAAGACAI 
$X174 phage A protein [AAUCUUGGAG GCUUUUUUAUGGUUCGUUCUT 
E. coli RNA polymerase B [AGCGAGCUGAG GAACCCUAUGGUUUACUCC! 
Consensus sequence AGGAGG 


Figure 9.7 The Shine-Dalgarno consensus binding sequence. 
The AUG start codon sequence (orange) is near the Shine-Dalgarno 
region (gold), which binds to the 3’ end of 16S rRNA. 
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tRNA™€t located at the start codon, three initiation fac- 
tors, and a molecule of GTP, has been formed. 

In the final step of initiation (Figure 9.6, ©), the 50S 
subunit joins the 30S subunit to form the intact ribosome. 
The energy for the union of the two subunits is derived 
from hydrolysis of GTP to GDP (guanosine diphosphate). 
The dissociation of IF1, IF2, and IF3 accompanies the 
joining of subunits that creates the 70S initiation com- 
plex. This complex is a fully active ribosome with a P 
site, an A site, an E site, and a channel for exit of the 
polypeptide. The first tRNA (tRNA™*) is already paired 
with mRNA at the P site, and the open A site contains the 
second codon and is awaiting the next charged tRNA. 


Eukaryotic Translation Initiation The eukaryotic 40S 
ribosomal subunit complexes with three eukaryotic 
initiation factor (eIF) proteins eIF1, e[F1A, and elF3 to 
form the preinitiation complex (Figure 9.8, @). In step, Othe 
preinitiation complex joins with the initiator tRNA and eIF5. 

The initiation complex is formed by binding of 
the mRNA. This initiates the process called scanning 
(Figure 9.8, ©), in which the small ribosomal subunit 
moves along the 5’ UTR in search of the start codon. 
About 90% of eukaryotic mRNAs use the first AUG en- 
countered by the initiation complex as the start codon, 
but the remaining 10% use the second or, in some cases, 
the third AUG as the start codon. The initiation complex 
is able to accurately locate the authentic start codon be- 
cause the codon is embedded in a consensus sequence 
that reads 


5'’-ACCAUGG- 3’ 


(the start codon itself is shown in bold). This consensus 
sequence is called the Kozak sequence after Marilyn 
Kozak, who discovered it in 1978. 

Locating the start codon leads to recruitment of the 
60S subunit to the complex, using energy derived from 
GTP hydrolysis. This final step @in the formation of 
the 80S ribosome is accompanied by joining of the two 
subunits and dissociation of the elF proteins. In the 80S 
ribosome, the initiator tRNAM* is located at the P site; 
the A site is vacant, awaiting arrival of the second tRNA 
(Genetic Analysis 9.1). 


Archaeal Translation Initiation and Its Implications for 
Evolution Archaeal ribosome subunits are composed of 
rRNAs that are more similar in size to those of bacteria 
than of eukaryotes. However, the ribosomal RNAs 
that make up the central structure of the subunits are 
distinct in each domain. Indeed the archaeal domain 
was only discovered after Carl Woese sequenced and 
compared rRNAs from many organisms and found that 
their sequences clustered into the three domains of life 
depicted in Figure 1.3. 

Despite the similarity in size of archaeal and bac- 
terial ribosomes, the process of translation initiation 
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Figure 9.8 Initiation of eukaryotic translation. 

in archaea is decidedly eukaryote-like. One example 
of this similarity is the archaeal use of methionine as 
the common first amino acid of polypeptide chains. 
This is like eukaryotes and unlike bacteria, which use 
N-fromyl-methionine. A second aspect of archaeal 
translation initiation concerns the presence of Shine- 
Dalgarno sequences. These are relatively common in 
archaeal species that either do not produce leaderless 
mRNAs or produce very few. In contrast, archaeal 
species that produce a high proportion of leaderless 
mRNA, Shine-Dalgarno sequences are not as common, 
although they have been detected. 

More significantly from an evolutionary perspective, 
Table 9.3 lists archaeal translation initiation factor proteins 
and identifies their homologies to eukaryotic and bacte- 
rial proteins. Recall from our discussion in Section 1.4 


@ Formation of initiation complex 


An initiator tRNA with the elF5 
binds to form the initiation 
complex. 


© Ribosome assembly and 
translation initiation 


The large subunit attaches to 
form the 80S ribosome that 
begins translation. 


that amino acid or nucleic acid sequences (proteins, DNA, 
or RNA) that are homologous have a common ancestral 
origin. As a consequence, proteins that have greater degrees 
of homology have more recent common ancestral history 
than do proteins with lower levels of homology. If proteins 
do not share a common ancestral history, they will not re- 
veal homology. 

Based on the homologous protein information in 
Table 9.3, it is clear that translation initiation in ar- 
chaea is more complex than in bacteria and that known 
archaeal initiation factor proteins (alFs) are homolo- 
gous in structure and function to eIFs. This comparison 
of critical translational proteins also indicates striking 
similarity of translation initiation across the three do- 
mains of life. Translation in all forms of life has a common 
origin. Evolution has acted to conserve the key protein 
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Table 9.3 Translation Initiation Factor Homologs 


Function Bacterial Homolog” 
mRNA binding; start codon IF3 (in some 
fidelity phyla only) 
mRNA binding IF1 l 

~ tRNA P site binding IF2 
tRNA™* binding No homolog 


"The absence of a homologous protein is identified as “No homolog” 
© Archaeal proteins are identified by the letter a. 
€ Eukaryotic proteins are identified by the letter e. 


components of translation, with each domain acquiring 
its own specific features of translation. 

The archaea have multiple mechanisms of mRNA- 
ribosome interaction at translation initiation. This is most 
apparent at the 5' mRNA end where certain archaeal spe- 
cies have a large percentage—some studies say more than 
50% of their mRNAs—that appear not to have a 5’ UTR. 
Those mRNAs lacking a 5’ UTR are said to be leader- 
less mRNAs and are apparently missing all or most of 
the translation initiating segments, including the Shine- 
Dalgarno sequence in some cases. The mechanism through 
which leaderless mRNA translation is initiated is not yet 
known. Archaeal species producing mRNAs with 5’ UTRs 
typically have Shine-Dalgarno sequences to aid translation 
intiation. 

Analysis of experimental in vitro translation 
(translation in a test tube using ribosomes and trans- 
lationally active proteins) testing the ability of bacte- 
rial and eukaryotic ribosomes and translational proteins 
to translate leaderless mRNAs from archaea finds that 
translation works efficiently in both in vitro systems. 
Leaderless mRNAs are very rare in bacteria or in eu- 
karyotes, yet they are efficiently translated in vitro. This 
finding does not suggest a translational mechanism, but 
it has led to speculation that the leaderless mRNA state 
may be ancestral to the state featuring 5’ UTRs. In other 
words, it is possible that the last universal common 
ancestor (LUCA) of bacteria, archaea, and eukaryotes 
produced leaderless mRNAs and that the mRNAs with 
5' UTRs are a more recent development. In this context, 
archaeal translation may be something of a relic reminis- 
cent of the situation in the LUCA. 


Polypeptide Elongation 


Elongation, the second phase of translation, begins with 
the recruitment of elongation factor (EF) proteins into 
the initiation complex. Elongation factors facilitate three 
steps of polypeptide synthesis: 


1. Recruitment of charged tRNAs to the A site 


Archaeal Homolog? Eukaryotic Homolog‘ 


alF1 elF1 
alFla elF1A/elF4 
alF2/5 elF5 
alF3 elF3 


2. Formation of a peptide bond between sequential 
amino acids 


3. Translocation of the ribosome in the 3’ direction 
along mRNA 


GTP cleavage provides the energy for each step of 
elongation in bacteria, archaea, and eukaryotes (Foundation 
Figure 9.9). Moreover, the steps in the elongation process 
are the same in all three types of organisms: although the 
elongation factors differ, the ribosomal P, A, and E sites of 
all three organisms serve nearly identical functions. The 
rates of elongation are also similar; bacteria add about 
20 new amino acids per second to a nascent polypeptide 
chain, and eukaryotes elongate the polypeptide at a rate of 
15 amino acids per second. The elongation rate in archaea 
has not been established. Lastly, numerous studies indicate 
high fidelity of translation in all organisms. An error rate 
of approximately one amino acid in each 10,000 added to 
polypeptides is estimated for bacteria. 


Polypeptide Elongation in Bacteria Different elonga- 
tion factor proteins (EFs) and other ribosomal proteins 
carry out elongation in a series of steps depicted in 
Foundation Figure 9.9, while specifically describing 
translation in bacteria, is generally accurate for all 
organisms. The energy required for these steps is 
generated by hydrolysis, the cleavage of one phosphate 
molecules from guanosine triphosphate molecules 
(GTP). Hydrolysis releases energy and converts 
nucleotide triphosphates to nucleotide diphosphates 
(i.e, GTP — GDP). In step @ a charged tRNAs is 
bound by the elongation factor EF-Tu and GTP. In 
step @, the tRNA affiliates with the correct anticodon 
sequence enters the A site. In step @ tRNA pairs with 
the mRNA codon and hydrolysis of GTP releases 
EF-Tu-GDP from tRNA. In step @, the enzyme peptidyl 
transferase catalyzes peptide bond formation between 
the amino acid at the P site and the newly recruited 
amino acid at the A site. This elongates the polypeptide 
and transfers the polypeptide to the tRNA at the A site. 
The tRNA at the P site departs the ribosome through 
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1. Elongation factor protein EF-Tu and GTP attach to 
a charged tRNA. 


GTP hydrolysis 


Charged tRNA 


) Charged tRNA-codon paring at A site 
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2. Many charged tRNAs enter the A site, only the one with 
the correct anticodon sequence pairs with the codon. 


Peptide bond formation 
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4. Peptidyl transferase catalyzes the formation of a peptide 
bond between the amino acid in the P and A sites. The 
peptide chain moves to the A site. 


3. GTP hydrolyzed to GDP and EF-Tu-GDP released 


@ Translocation A site open for 


charged tRNA 


pau n pau n 


Ribosome movement 
along mRNA 


5 


5. Elongation factor protein G (EF-G) translocates the 
ribosome; the uncharged tRNA is released to the E site and 
a new tRNA is recruited to the A site. 


6. The open A site is ready to recruit the correct 
charged tRNA. 
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GENETIC ANALYSIS 


BREAK IT DOWN: The Kozak consen- 


PROBLEM In an investigation designed to identify the consensus sequence containing the AUG codon sis sequence, 5- KCGAUGG -S indide: 


that initiates translation of eukaryotic mRNA, Marilyn Kozak (1986) compared the amounts of protein the AUG start codon sequence and several 
produced from 10 mutant mRNA molecules having different single-base substitutions flanking the AUG. surrounding mRNA nucleotides and is critical 
Protein production was gauged by the optical density (OD) of protein bands in electrophoretic gels. aso aaa 


Higher OD values indicated more protein produced. In the two tables shown, AUG, the start codon, is 
highlighted and its adenine (A) is labeled the +1 nucleotide of the translated region. Kozak examined 
six single-base mutants at nucleotide —3 and +4. These are identified by number (1 to 6) in Table A. 
She also examined four single-base mutants of positions —2 and —1. These are numbered 7 to 10 in 
Table B. The OD for protein production by each mutant was measured and is given below the mutant in 
the table. Use the OD values to determine answers to the problem questions. 


Table A Six Position —3 and +4 Mutants Table B Four Position —2 and —1 Mutants 


Mutant Mutant 
number number 


BREAK IT DOWN: Efficient transla- 
tion of mRNA produces more protein and is 
indicated by higher OD values for mutants 

possessing that capability (p. 313). 


— 
= 
[e] 


-3 
E2] 
El 

+1 

+2 
+3 


Bae ee |~ 
a 
Eee | 
aaa 


Ò 
I GAGE a 
EEE: |: ||». 
BEE: |: DES 
EEE IB: |: H|» 
BEE: |: pE 
BEE: |: DE 


+3 
44 OD 23) 1.8 1.9 2.0 
OD 0.7 2.6 0.9 0.9 31 5.0 
a. Looking just at the nucleotides in positions —3 and +4 for the six mutants in Table A, decide which 
nucleotides give the highest level of protein production. 
b. Describe the impact of each nucleotide (A, T, c, and G) in the —3 position. 


c. Looking just at nucleotides at position —2 and —1 for the four mutants in Table B, decide which 
nucleotides give the highest level of protein production. 


Why did Kozak use only A in the —3 position to test the effects of nucleotides at positions —2 and —1? 


e. Putting together data from both Table A and Table B, give the sequence of the mRNA region from 
—3 to +4 that produces the highest level of translation. 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic this problem 1. This problem involves examination and interpretation of the effects 
addresses and the nature of the that sequence differences surrounding the mRNA start codon have on 
requested answer. translation. The answer requires identifying the effects of base substitutions 
on translation and identifying the mRNA sequence corresponding to the 
highest translation level. 
2. Identify the critical information given in 2. Two tables provide mRNA sequence for different sequence variants. For each 
the problem. variant, an OD value describes the approximate level of protein produced by 
TIP: Notice that AUG is the start codon translation of the sequence. Higher OD values correspond to more protein 
sequence in all mutants tested. As a consequence, production. 


differences in OD result from differences among 
Deduce the surrounding nucleotides. 


3. Identify the constant and variable 3. In Table A, the nucleotide c is constant at positions —1 and —2, and position 
nucleotides displayed in Table A. +3 is always G. Nucleotide variability is limited to positions —3 and +4. 

4. Identify the constant and variable 4. In Table B, only the nucleotide at the —2 position varies; all other nucleotides 
nucleotides shown in Table B. are constant. 
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GENETIC ANALYSIS CONTINUED 


Solve 


Answer a 
5. Specify the nucleotides in the —3 and +4 5. In Table A, the presence of A in position —3 and G in position +4 produces 
positions (Table A) that give the highest the highest OD value. At the +4 position, G produces two high OD values 
OD. and two low ODs, and T produces one high and one low OD. 
Answer b 
6. Assess how each nucleotide in the —3 6. At position —3, A produces the highest and the third-highest OD values; c 
position affects OD. produces the second-highest and the lowest OD; T and c produce the same 
low OD value. 
Answer c 
7. Evaluate how nucleotide differences at the 7. In Table B, a c in position —2 and an Ain position —1 produce the highest 
—1 and —2 positions (Table B) affect OD. OD. Considering only the variable position —2, c produces higher OD values 
than does Gc. 
Answer d 
8. Explain the decision to base Table B 8. Adenine is selected as the nucleotide in position —3 for Table B evaluations 
evaluations only on sequences with A in based on the high average OD value for this nucleotide in comparison 
the —3 position. to other nucleotides. The average OD for A in the —3 position is 
” i j 50 + 26 7 331 +07 
ig Pe panene tise ag Got 26) 5 ) = 3.8 versus the next-highest average of ae 5 ) = 19 fore 
consensus sequence. in the —3 position. 
Answer e 


9. Identify the start codon consensus J 
sequence that results in the highest level 
of translation. 


9. Data from the two tables combined identify the sequence ACCAUGG (start 
codon in bold) as the most efficient consensus sequence for the start codon. 
For the nucleotide positions immediately surrounding the start codon, A is 
most efficient at —3, c is more efficient than G at —2, c is more efficient than 
Aat—1, and Gis more efficient than U at +4. 


For more practice, see Problems 32, 33, and 34. Visit the Study Area to access study tools. MasteringGenetics™ 


the E site. In step @ elongation factor EF-G uses GTP Translation Termination 
hydrolysis to, EFs translocate the ribosome by moving 
it in the 3’ direction on mRNA. This translocation step 
is exactly one codon in length, that is, three nucleotides. 
Translocation moves the tRNA formerly at the A site to 


The elongation cycle continues until one of the three 
stop codons, UAG, UGA, or UAA, enters the A site of the 
ribosome. There are no tRNAs with anticodons comple- 


the P site, and opens the A site for binding by a charged mentary to stop codons, so the entry of a stop codon into 
tRNA with the correct anticodon sequence. In step the A site is a translation-terminating event. All organ- 


@ the next charged tRNA is ready to enter the A site. isms use release factors (RF) to bind a stop codon in the 
A site (Figure 9.10). The catalytic activity of RFs releases 


the polypeptide bound to tRNA at the P site. Polypeptide 
release causes ejection of the RF from the P site and leads 
to the separation of the ribosomal subunits. 

In bacteria, two release factors, RF1 and RF2, rec- 
ognize stop codons. RF1 recognizes UAG and UAA, and 


Elongation of Eukaryoticand Archaeal Polypeptides 
Evolution has acted to strongly conserve the basic 
biochemistry of polypeptide elongation in all three 
domains of life. The elongation factors that carry out 
polypeptide elongation in eukaryotes and archaea are 
shown in Table 9.4. All organisms use two elongation 


factors to carry out polypeptide elongation, and the 
illustration of polypeptide elongation in Figure 9.9 is an Table 9.4 Translation Elongation Factor Homologs 


equally accurate portrayal of the process in eukaryotes 


and archaea. Based on sequence comparisons, the $ Bacterial Archaeal Eukaryotic 
archaeal and eukaryotic elongation factor homologs are jonction mennee ee alee 
more alike than are archaeal and bacterial EFs. This Adjusts tRNA in EFT aEF1 eEF1 
sequence analysis supports the initial assessment of Carl A site 

Woese that eukaryotes and archaea are more closely wra EFG aEF2 REF? 
related to one another than either is to bacteria (see translocation 


Section 1.1). 
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@ Release-factor recruitment 


Release factors are recruited 
when a stop codon occurs 


at the A site. 
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Figure 9.10 Termination of translation by release factor 
(eRF) proteins. A similar process terminates bacterial and 
archaeal translation. 


RF2 recognizes UAA and UGA. A third bacterial release 
factor, RF3, is active in recycling RF1. Eukaryotic and 
archaeal translation are terminated by the action of a 
single release factor, identified as eRF1 in eukaryotes 
and aRF1 in archaea, that recognizes all three stop co- 
dons in organisms of both of these domains. Eukaryotes 
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Table 9.5 Translation Termination Factor Homologs 


Bacterial Archaeal Eukaryotic 
Function Homolog Homolog Homolog 
Stop codon RF1 and RF2 = aRF1 eRF1 
recognition 
Recycling RF1 RF3 No homolog eRF3 
and eRF1 
Ribosome RRF No homolog No homolog 
recycling 


have a second RF that, like RF3 of bacteria, participates 
in recycling eRF1. The currently available information 
on sequence and function of RFs suggests that archaea 
and eukaryotes have RFs that are more like one another 
than either is to bacterial RFs (Table 9.5). 


9.3 Translation Is Fast and Efficient 


With mRNA transcripts of hundreds to thousands of 
genes in cells, translation is an active and ongoing pro- 
cess that must efficiently initiate, elongate, and terminate 
polypeptide synthesis. In recent decades, research has 
uncovered several aspects of the translation machinery 
that help explain the speed, accuracy, and efficiency of 
polypeptide production. 


The Translational Complex 


Cell biologists estimate that each bacterial cell contains 
about 20,000 ribosomes, collectively constituting nearly 
one-quarter of the mass of the cell. The number of ribo- 
somes per eukaryotic cell is variable, but it too is in the tens 
of thousands. Given these numbers, it is not surprising that 
translation is almost never a matter of a solitary ribosome 
translating a single mRNA. Rather, electron micrographs 
reveal structures called polyribosomes, a busy transla- 
tional complex containing multiple ribosomes that are each 
actively translating the same mRNA (Figure 9.11). Each 
ribosome in the polyribosome structure independently syn- 
thesizes a polypeptide, markedly increasing the efficiency of 
utilization of an mRNA. 

In bacteria, the coupling of transcription and translation 
(Chapter 8) allows ribosomes to engage in translation of the 
5’ region of mRNAs whose 3’ end is still under construction 
by RNA polymerase. This coupling is observed in Figure 9.11. 
Transcription occurs along DNA in the left-hand to right- 
hand direction. Translation of the mRNA transcripts begins 
before transcription is complete. In eukaryotes, however, 
transcription and translation are uncoupled. Transcription 
takes place in the nucleus, where pre-mRNA is processed to 
form mature mRNA. Translation occurs in the cytoplasm 
after release of mature mRNA. 
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Figure 9.11  Polyribosomes. (a) Electron micrograph of a poly- 
ribosome shows multiple ribosomes simultaneously translating 

a single mRNA molecule. Ribosomes that are closest to the stop 
codon have the longest polypeptides. (b) Artist rendition of the 
polyribosome electron micrograph. Transcription and translation 
are coupled in bacteria, and the translation direction is indicated. 


Translation of Polycistronic mRNA 


Each polypeptide-producing gene in eukaryotes produces 
monocistronic mRNA, meaning mRNA that directs the 
synthesis of a single kind of polypeptide. The scanning 
model for translation described earlier for eukaryotes im- 
plies that a single start codon is identified in eukaryotic 
mRNA to initiate synthesis of one kind of polypeptide chain. 
In contrast, groups of bacterial and archaeal genes often 
share a single promoter, and the resulting mRNA transcript 
contains information that synthesizes several different poly- 
peptides. These polycistronic mRNAs are produced as part 
of operon systems that regulate the transcription of sets of 
bacterial genes functioning in the same metabolic pathway 
(a form of regulation we discuss in Chapter 15). 
Polycistronic mRNAs consist of multiple polypeptide- 
producing segments—multiple cistrons—that each con- 
tain sequence information for translation initiation. In 
the case of bacteria, and in all but the leaderless mRNAs 
in archaea, the translation-initiating region contains a 
Shine-Dalgarno sequence and start and stop codons. 
An intercistronic spacer sequence that is not translated 


separates the cistrons of polycistronic mRNA and con- 
tains the Shine-Dalgarno sequences (Figure 9.12). 
Bacterial intercistronic spacers are variable in length: 
Some are just a few nucleotides long, although most are 
30 to 40 nucleotides long. If the intercistronic spacer 
is a few nucleotides in length, it is, short enough to be 
spanned by a ribosome. In such systems, the ribosome 
remains intact after completing synthesis of one poly- 
peptide, and it translates the other genes encoded in the 
polycistronic mRNA as well. On the other hand, for lon- 
ger intercistronic spacers, the initial ribosome dissociates 
and new translation initiation must occur to translate the 
next polypeptide encoded by the polycistronic mRNA. 


9.4 The Genetic Code Translates 
Messenger RNA into Polypeptide 


Nucleic acids and amino acids are chemically very different 
compounds, and there is no direct mechanism by which 
mRNA could synthesize a polypeptide. Nevertheless, the 
genetic information carried in the nucleotide sequences 
of mRNA does provide a means by which the amino 
acid sequences of polypeptides can be specified. The 
“genetic code” is the name used to describe the correspon- 
dence between mRNA codon sequences and individual 
amino acids. 

Converting the sequence of mRNA into a polypeptide 
depends on transfer RNA (tRNA) to carry amino acids to 
the ribosome. At ribosomes, tRNA pairs with mRNA by 
complementary base pairing between mRNA codon nucle- 
otides and tRNA anticodon nucleotides. Once the correct 
tRNA is bound by a codon, it transfers its amino acid to the 
end of a growing polypeptide chain. Transfer RNA mol- 
ecules facilitate the translation of genetic information from 
one chemical language (nucleic acid) to another (amino 
acid). That is, tRNA is an adaptor molecule that interprets 
and then acts on the information carried in mRNA. 

Our review of translation and the genetic code 
in Chapter 1 depicts a triplet genetic code: Groups of 
three consecutive mRNA nucleotides form codons that 
each correspond to one amino acid. The genetic code 
contains 64 different codons, more than enough to en- 
code the 20 common amino acids used to construct poly- 
peptides. The greater number of codons than amino acids 


Intercistronic spacers 


Intercistronic spacers 


[Gene A Shine- Gene B Shine- Gene c—_ 
Shine-Dalgarno Start Stop Dalgarno Start Stop Dalgarno Start Stop 
Polycistronic sequence codon codon sequence codon codon sequence codon codon 
mRNA 5’ fi/ | 
Polypeptide A Polypeptide B Polypeptide C 


Figure 9.12 Polycistronic mRNA. A polycistronic mRNA is a transcript of multiple genes and will 


produce a polypeptide from each gene. 


[EU 4 3 
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leads to redundancy of the genetic code, as evidenced by 
the observation that single amino acids are specified by 
from one to as many as six different codons. This redun- 
dancy is explained by aspects of the base-pairing interac- 
tions between tRNA anticodons and mRNA codons. 


The Genetic Code Displays 
Third-Base Wobble 


The triplet genetic code is a biological example of 
Ockham’s razor, the principle that the simplest hypoth- 
esis is the most likely to be correct: During the late 1950s, 
arithmetic logic led many researchers to conclude that 
the genetic code was most likely triplet. This simple solu- 
tion to the question of how amino acid sequences could 
be coded by nucleic acid sequences posits that a doublet 
genetic code (two nucleotides per codon) could produce 
just 16 (42) combinations of codons, which is not enough 
different combinations to specify 20 amino acids. On 
the other hand, a quadruplet genetic code would gener- 
ate 4f, or 256, different combinations of codons—far 
too many for the needs of genomes. In contrast, a triplet 
genetic code, yielding 4°, or 64 different codons, provides 
enough variety to encode 20 amino acids with some, but 
not excessive, redundancy (Figure 9.13 and genetic code 
information inside the front cover of the book). Among 
the 64 codons, 61 specify amino acids, and the remaining 


Figure 9.13 The genetic code. To read this circular table of 
the genetic code, start with the inner ring, which contains the 
nucleotide in the first position (5’ nucleotide) of a codon. The 
second-position nucleotide is in the second ring, and the third- 
position nucleotide is in the third ring. Three-letter and one- 
letter abbreviations for the corresponding amino acids occupy 
the outermost rings. 


3 are the stop codons that terminate translation. Only two 
amino acids, methionine (Met)—with the codon AUG— 
and tryptophan (Trp)—with the codon UGG—are encoded 
by single codons. The other 18 amino acids are specified 
by two to six codons. Codons that specify the same amino 
acid are called synonymous codons. 

Each transfer RNA molecule carries a particular amino 
acid to the ribosome, where complementary base pairing 
between each mRNA codon sequence and the correspond- 
ing anticodon sequence of a correct tRNA takes place. Note 
that this complementary base pairing requires antiparallel 
alignment of the mRNA and tRNA strands. Consider the 
codon sequence for aspartic acid (Asp), 5'-GAC-3’. Base- 
pairing rules predict that the tRNA anticodon sequence is 
3'-CUG-5’ (Figure 9.14). Asp is also specified by a synony- 
mous codon, 5’-GAU- 3’, that pairs with tRNA carrying the 
anticodon sequence 5'-CUA-3'. Transfer RNA molecules 
with different anticodon sequences for the same amino 
acid are called isoaccepting tRNAs. 

Does the presence of synonymous codons and isoac- 
cepting tRNAs mean that a genome must provide 61 differ- 
ent tRNA genes and transcribe a tRNA molecule to match 
each codon? The answer is no. In fact, most genomes have 
30 to 50 different tRNA genes. How does a genome that 
encodes fewer than 61 different tRNA molecules recognize 
all 61 functional codons? The answer lies in relaxation of 
the strict complementary base-pairing rules at the third 
base of the codon. The mechanics of translation provide 
for flexibility in the pairing of the third base, the 3’-most 
nucleotide, of the codon. Third-base wobble is the name 
given to the mechanism that relaxes the requirement for 
complementary base pairing between the third base of a 
codon and the corresponding nucleotide of its anticodon. 

How does third-base wobble work? The answer is found 
in the chemical structures of nucleotides that hydrogen 
bond in base-pairing reactions. A careful look at synony- 
mous codons reveals a pattern to the chemical structure of 
the third bases in cases of wobble. With the exception of 
the AUA codon for isoleucine (Ile) and the UGG codon for 
tryptophan (Trp), synonymous codons can be grouped into 
pairs that have the same two nucleotides in the first and 
second positions and differ only at the third base, where 


Amino acids 


3’ 5’ 


5’ GAU 3’ 


Anticodons 3’ 5 
mRNA codons 5’ GAO 3’ 


Figure 9.14 Codon-anticodon pairing. A pair of isoaccept- 
ing aspartic acid tRNAs illustrates complementary antiparallel 
base-pairing of codon and anticodon sequences. 
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the synonymous codons either both carry a purine (A or G) 
or both carry a pyrimidine (C or U). For example, consider 
the synonymous pairs of codons for histidine (His) and 
glutamine (Gln; see Figure 9.13). The first two bases of each 
of these codons are C and A. Both His codons have a pyrimi- 
dine at the third position, whereas the Gln codons have a 
purine in the third position. As you look at other pairs of 
synonymous codons in the genetic code information inside 
the book front cover, notice that they also differ only by car- 
rying the alternative purine or pyrimidine nucleotide at the 
third position. 

Amino acids specified by four synonymous codons, 
such as alanine (Ala), valine (Val), and glycine (Gly), display 
an analogous pattern: Each amino acid is represented by 
two pairs of synonymous codons, and the members of each 
pair differ in the third position only, by carrying the alter- 
nate purine or pyrimidine. The pattern continues in argi- 
nine (Arg), serine (Ser), and leucine (Leu), each of which is 
specified by six synonymous codons. These sets of codons 
each consist of three pairs, each pair having the same 
nucleotides in the first two positions and differing by hav- 
ing the alternate purine or pyrimidine in the third position. 

Third-base wobble occurs through flexible base pair- 
ing between the wobble nucleotide—that is, the 3’ nucleo- 
tide of a codon—and the 5’ nucleotide of an anticodon. 
At the wobble position, base pairing between the nucleo- 
tides of the codon and the anticodon need not be comple- 
mentary. They must, however, involve a purine and a 
pyrimidine. Third-base wobble pairings are summarized in 
Table 9.6. The wobble nucleotides in different anticodons 
include all the RNA nucleotides and also the modified 
nucleotide inosine (I). Inosine is structurally similar to G 
but lacks the amino group attached to guanine’s 2 carbon. 
Because of this difference, inosine base-pairs with either 
purines or pyrimidines. Figure 9.15 shows three examples 
of third-base wobble, in which three tRNA molecules col- 
lectively recognize seven different codons. 


Charging tRNA Molecules 


Transfer RNA molecules are transcribed from tRNA 
genes. Recall the three-dimensional structure of tRNAs 
(see Figure 8.28) and the CCA terminus at the 3’ end of 
tRNA molecules as the site of attachment of an amino 


Table 9.6 


Third-Base Wobble Pairing between Codon 
and Anticodon Nucleotides 


3’ Nucleotide of Codon 5’ Nucleotide of Anticodon 


AOrG U 
G i (0 
U A 
uorc G 
U, C,OrA T 


Amino acids 


Anticodons 
mRNA codons UOU UCA ‘AUC 
- AUA 
Wobble Wobble Wobble 
position position position 


Figure 9.15 Effect of wobble. Wobble base pairing reduces 
the number of different tRNAs required during translation. 

In this example, two different tRNAs, each carrying serine, 

each use wobble to recognize a different pair of serine codons. 
A single isoleucine-carrying tRNA uses wobble to recognize 
three isoleucine codons. 


acid. Each tRNA carries only one of the 20 amino acids, 
and correct charging of each tRNA is crucial for the integ- 
rity of the genetic code. 

The charging of tRNAs is catalyzed by enzymes called 
aminoacyl-tRNA synthetases or, more simply, tRNA 
synthetases. There are 20 different tRNA synthetases, one 
for each of the amino acids. To charge an uncharged tRNA, 
a tRNA synthetase catalyzes a two-step reaction that forms 
a bond between the carboxyl group of the amino acid and 
the 3’ hydroxyl group of adenine in the CCA terminus. 
Experimental analysis reveals that the recognition of isoac- 
cepting tRNAs by tRNA synthetase is a complex process 
that does not follow a single set of rules. Mutations in any 
of the four arms of tRNA, or in the anticodon sequence it- 
self, render a tRNA unrecognizable to its tRNA synthetase. 

Studies of structural interactions between tRNA syn- 
thetases and their tRNAs show tRNA synthetase to be a 
large molecule that contacts several parts of a tRNA as 
part of the recognition process. These contact points can 
include the anticodon sequence and the other arms and 
loops of the tRNA (Figure 9.16). Once in contact with 
tRNA synthetase, the tRNA acceptor stem fits into an ac- 
tive site of tRNA synthetase. The active site contains the 
amino acid that will be added to the tRNA acceptor stem 
and ATP that provides energy for amino acid attachment. 

Familiarize yourself with Figure 9.13 and the genetic 
code information inside the front cover by using them to 
decipher the mutations shown in Genetic Analysis 9.2. 


9.5 Experiments Deciphered the 
Genetic Code 


A remarkable set of experiments performed over less 
than 4 years in the early 1960s deciphered the genetic 
code and opened the way for biologists to understand 
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Figure 9.16 Interaction of aminoacyl-tRNA synthetase with 
tRNA. Aminoacyl-tRNA synthetase contacts multiple points on 
tRNA. ATP and the 3’ acceptor stem of tRNA fit in a cleft that 
also accommodates the amino acid. 


the molecular processes that convert a messenger RNA 
nucleotide sequence into a polypeptide. At the time, 
biologists knew what the hereditary material was (DNA), 
and they knew what molecule conveyed the genetic mes- 
sage to ribosomes for translation (mRNA), but they did 
not know how the protein-coding information carried 
by messenger RNA was deciphered during the assembly 
of polypeptides. Several questions had to be answered 
about the structural nature of the genetic code before 
the code itself could be deciphered. The three most im- 
portant questions, listed here, are examined in the sec- 
tions below: 


1. Do neighboring codons overlap one another, or is 
each codon a separate sequence? 


2. How many nucleotides make up a messenger RNA 
codon? 


3. Is the polypeptide-coding information of messenger 
RNA continuous, or is coding information inter- 
rupted by gaps? 


No Overlap in the Genetic Code 
Consider the partial messenger RNA sequence 
... ACUAAG... 


If the genetic code is triplet and nonoverlapping (recall 
that a doublet code does not provide enough codons to 
specify 20 amino acids, and a quadruplet code provides 
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far too many), this partial sequence produces two codons, 
each specifying an amino acid: 


codon 1 2 
ACU AAG 
amino acid 1 2 


In an overlapping triplet genetic code, on the other hand, 
these six nucleotides would spell out four complete codons 
and two partial codons. The sequence would fully encode 
four amino acids and contribute to the coding of two others: 


.. ACUAAG... 
amino acid 1 ACU 
2 CUA 
3 UAA 
4 AAG 
5 AG 
6 Gx. 


In 1957, based on his analysis of the available informa- 
tion on amino acid sequences of proteins, Sidney Brenner 
became convinced that an overlapping triplet genetic code 
was impossible because it was too restrictive. To test his 
hypothesis, Brenner examined the upstream neighbor of 
each AAG lysine in a large number of proteins and found 17 
different amino acids in that position. He concluded that 
an overlapping genetic code restricted evolutionary flex- 
ibility and was unsupported by biochemical observations. 

Conclusive evidence of a nonoverlapping genetic code 
came from a 1960 study of single-nucleotide substitutions 
induced by the mutation-producing compound nitrous 
oxide. Heinz Fraenkel-Conrat and his colleagues studied 
the effect of nitrous oxide on the coat protein of tobacco 
mosaic virus (TMV). Nitrous oxide causes mutations by 
inducing single base-pair substitutions in DNA that lead to 
mutant mRNA molecules with one nucleotide base change 
compared to wild-type mRNA. Asingle base change in 
mRNA would alter three consecutive codons if the genetic 
code were overlapping, but just a single codon if the ge- 
netic code were nonoverlapping (Figure 9.17a). Fraenkel- 
Conrat’s mutation analysis revealed that only single amino 
acid changes occurred as a result of mutation by nitrous 
oxide. This result is consistent with that predicted for a 
nonoverlapping genetic code, and it is inconsistent with 
the prediction for an overlapping genetic code. 


A Triplet Genetic Code 


Proof of a triplet genetic code came in 1961 when Francis 
Crick, Leslie Barnett, Sidney Brenner, and R. J. Watts-Tobin 
used the compound proflavin to create mutations in a gene 
called rI in T4 bacteriophage. Proflavin causes mutations 
by inserting or deleting single base pairs from DNA. This 
deletion leads to the absence of single nucleotides from 
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(a) An overlapping genetic code would change three consecutive 
codons with each base mutation. 


Wild-type sequence Mutant sequence 


“ACUCAGAUA 
Codon1 ACU ACU 
Codon2 CUC (Gua 
Codon 3 UCA 
Codon 4 CAG 
Codon 5 AGA 
Codon 6 GAU GAU 
Codon 7 AUA AUA 
Codon 8 UAL Wie] 


(b) A nonoverlapping genetic code would change one codon with 
each base mutation. 


Wild-type sequence Mutant sequence 


“ACUCAGAUA  ACUCHGAUA 


Codon1 ACU ACU 
Codon 2 CAG 
Codon 3 AUA 


Figure 9.17 Proof that the genetic code is nonoverlapping. 
The sequence of the last 10 amino acids at the C-terminal end 
of a TMV protein contained a single amino acid change follow- 
ing the induction of base-substitution mutation. This result 
conforms to the prediction of the nonoverlapping model of the 
genetic code. 


mRNA, thus changing the reading frame of the mRNA. 
Reading frame refers to the specific codon sequence as de- 
termined by the point at which the grouping of nucleotides 
into triplets begins. The addition or deletion of nucleo- 
tides changes the reading frame and produces a mutation 
called a frameshift mutation. 

The following analogy illustrates the impact of frame- 
shift mutations. Single-letter additions or deletions garble 
the translated message by changing the reading frame: 


wild-type: YOUMAYNOWSIPTHETEA (“you may now 
sip the tea”) 

mutant (addition): YOUMA ¢ YNOWSIPTHETEA 
(“you ma |e yno wsi pth ete a”) 

mutant (deletion): YOUMAYNO | | SIPTHETEA 

(“you may nos ipt het ea”) 


Frameshift mutations can be reverted (i.e., the cor- 
rect reading frame can be restored) if a second mutation 
in a different location within the same gene restores the 
reading frame. This second mutation, called a reversion 
mutation, counteracts (“reverses”) the reading frame dis- 
ruption by inserting a nucleotide, if the initial mutation 
was a deletion, or by deleting a nucleotide, if the initial 
mutation was an insertion. For example, here is how the 
two frameshift mutations shown above might be reverted: 


mutant (addition): YOUMA C YNOWSIPTHETEA 
(you mac yno wsi pth ete a) 

reversion mutant (deletion): YOUMA WYNO | | 
SIPTHETEA (“you mac yno sip the tea”) 


mutant (deletion): YOUMAYNO | | SIPTHETEA (“you 
may nos ipt het ea”) 

reversion (addition): YOUMAYNOSIP R THE TEA 
(“you may nos ipr the tea”) 


Crick and his colleagues analyzed numerous bacte- 
riophage proflavin-induced ri/-gene mutants, designating 
each addition mutant as a (+) and each deletion mutation 
as a (—). They guessed that the first rli-gene mutant they 
examined, a mutation designated FC 0, resulted from in- 
sertion (“FC” stands for Francis Crick). Designating FC 0 as 
a (+) mutation turned out to be a correct guess. Based on 
their assumptions that (1) the genetic code is a nonover- 
lapping triplet and (2) FC 0 is an insertion (+) mutation, 
the data reported by Crick and colleagues supported the 
notion that the genetic code is based on nucleotide triplets. 

Data on several mutants is displayed in Table 9.7. Each 
mutant is designated either (+) or (—). Any combination of 
a (+) mutant and a (—) mutant generates a wild-type rever- 
tant. In each case, the initial mutation causes a frameshift 
mutation, and the reversion mutation restores the reading 
frame. The triplet structure of the genetic code is demon- 
strated by the observation that the reading frame is restored 
by the presence of three (+) mutations or three (—) muta- 
tions. For example, the total of three insertions restores the 
reading frame in the following sentence after the position of 
the third insertion: 


triple mutant (addition): 
YOUMA c YNOW TS LIPTHETEA (“you ma 
e¢ yno w ts 1 ip the tea”) 


No Gaps in the Genetic Code 


In their 1961 research, Crick and colleagues also sug- 
gested that the genetic code is read as a continuous string 
of mRNA nucleotides uninterrupted by any kind of gap, 


Table 9.7 Phenotypes Resulting from Various Combi- 


nations of Proflavin-Induced Base-Pair 


Insertion (+) and Deletion (—) Mutations 
at the rll Locus of Bacteriophage T4 


Combined 

Mutations +/— Designations Result 

FCO, FC 1 [= Wild-type revertant 

EQO EC21 Ta Wild-type revertant | 

FE 40, FC 1 T= Wild-type revertant _ 
fare 58, FC 1 tel Wild-type revertant 


MWFCOIEC40"Fess 
PCIe oi FC 23 ZES 


Wild-type revertant 
Wild-type revertant 


i FC 0, FC 40 rll mutant 
rea reson rll mutant 

FGIMEG21 == rll mutant 
eA —— all mutant 


GENETIC ANALYSIS 


PROBLEM A portion of an mRNA encoding C-terminal amino acids and the stop codon of a wild-type 
polypeptide is 


5'-...CAACUGCCUGACCCACACUUAUCACUAAGUAGCCUAGCAGUCUGA...- 3’ 


. : A . : . R , BREAK IT DOWN: The mRNA 
The wild-type amino acid sequence encoded by this portion of mRNA contains the amino acid Asn en- sequence is complementary to the DNA 


coded by the codon 5'-CAA-3'. The remainder of the amino acids are encoded in the same reading frame. template strand and differs from the 
DNA coding strand only by having uracil 
N.. Asn-Cys-Leu-Thr-His-Thr-Tyr-His-C instead of thymine (p. 270). 


The C-terminal ends of three independently obtained mutant proteins produced by this gene are as 


follows. 

Mutant 1: N...Asn-Cys-Leu-Thr-His-Thr-C 

Mutant 2: N...Asn-Cys-Leu-Thr-His-Thr-Tyr-His-Lys-C 

Mutant 3: N...Asn-Cys-Leu-Thr-His-Thr-Tyr-His-Tyr-Ser-Ser-Leu-Ala-Val-C 


Identify the mutational events that produce each of the mutant proteins. 


BREAK IT DOWN: Mutations occur at the level of DNA. Compari- 
son of each mutant DNA and amino acid sequences with the wild-type 
sequence will reveal how the DNA sequence is changed (p. 321). 


Solution Strategies Solution Steps 


Evaluate 

1. Identify the topic this 1. This problem concerns evaluation of the C-terminal end of a wild-type protein 
problem addresses and the sequence and the mRNA segment that encodes it and comparison of the wild-type 
nature of the requested protein to three mutant proteins to determine the alteration producing each mutant. 
answer. The answers require the identification of specific mRNA sequence changes leading to 

each mutant protein. 

2. Identify the critical 2. In this problem the C-terminal end of a wild-type protein and the mRNA sequence that 
information given in the encodes it are given. Also given are the C-terminal sequences of three mutant proteins 
problem. encoded by mutant mRNA sequences derived by alteration of the wild-type sequence. 

Deduce 

3. Use the genetic code 3. Two codons, AAC and AAU, encode asparagine (Asn). If we skip the 5’-most nucleotide 
to identify the codons of the mRNA sequence and begin reading at the A in the second position, the first 
corresponding to wild-type codon is AAC followed by UGC - CUG- ACC - CAC - ACU- UAU- CAC-UAA. These codons 
amino acids and to identify encode the wild-type amino acids, and UAA is the stop codon. 


the stop codon. 
4. Compare each mutant 


4. Mutant 1—The polypeptide sequence is truncated two amino acids short of the normal 


polypeptide to the wild type stop codon. The Tyr codon (UAU) appears to have changed to a stop codon. 
and determine which codon Mutant 2—The wild-type sequence is extended by the addition of lysine (Lys), 
contains the mutation. indicating that mutation changed the stop codon to a codon specifying Lys and is now 
TIP: Any of three stop codons (UAG, UGA, followed immediately by a new stop codon. 
or UAA) terminates translation immediately Z i $ R i 3 
after the codon specifying the amino acid at Mutant 3—The wild-type sequence is extended by six amino acids. This suggests 
the C terminus of a polypeptide. another mutation affected the stop codon. 
Solve 
5. Identify the mutation and its 5. Two different base substitutions altering the tyrosine (Tyr) codon UAU to a stop codon 
consequence for translation could cause Mutant 1. The wild-type vAu codon was most likely altered by base 
in Mutant 1. substitution to form either a UAA or a UAG stop codon. 
6. Identify the mutation and its 6. Lysine (Lys), which was added to the mutant polypeptide, is encoded by AAA or AAG. 
consequence in Mutant 2. Deletion of the u from the wild-type stop codon would produce an AAG codon followed 
by UAG, a stop codon. 
7. Identify the mutation and its 7. Tyrosine, specified by codons UAU and UAC, is found in place of the normal stop codon. 
consequence in Mutant 3. This is followed by a serine codon (UCN or AGU/C), rather than the GUA (Val) that follows 
TIP: Examine the wild-type nudeotide sequence the “in-frame” stop codon in the wild type. A base-pair insertion that adds a U or a c 
at the place where mutation is expected to have into the third position of the normal UAA stop codon forms a UAU or a UAC tyrosine (Tyr) 


occurred, and identify ways in which base substitution, | Codon, The altered reading frame from that point would then read acu (Ser), followed 
insertion, or deletion could have had the observed 
effect on the amino acid sequence. by Acc (Ser), CUA (Leu), GCA (Ala), cuc (Val), and UGA (stop). 


For more practice, see Problems 5, 11, 16, and 29. Visit the Study Area to access study tools. MasteringGenetics™ 
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space, or pause. If a gap or spacer were present between 
mRNA codons, the mRNA transcript might be repre- 
sented as follows (x indicates the gap between codons): 


YOUxMAYxNOWxSIPxTHExTEAx (“you 
may now sip the tea”) 


If the genetic code were structured in some such 
way, with each codon set off from its neighbors, inser- 
tion or deletion of a nucleotide would not cause the kind 
of frameshift mutation that Crick and colleagues had 
observed. Instead, insertion or deletion of nucleotides 
could be expected to alter the affected codon but not 
the identity of adjoining codons. For example, consider 
the following insertion mutation, where the separation 
between codons confines the alteration to a single word: 


YOUx,MA 'T Yx,NOWx,SIPx,THEx,TEAx, (“you mat y 
now sip the tea”) 


Deciphering the Genetic Code 


The genetic code was deciphered in a series of 
experiments performed between 1961 and 1965. This 
remarkable 4-year period in biology was highlighted 
by extensive collaborative and competitive international 
research that culminated in the assembly of a simple 
table containing the instructions shared by all organisms 
for translating mRNA nucleotide sequences into poly- 
peptide sequences. Deciphering the genetic code was a 
milestone in establishing the mechanism of the central 
dogma of biology (DNA — RNA —> protein) and laying 
the molecular foundation for modern genetic research. 
This triumph of deductive reasoning was instantly rec- 
ognized for its profound significance, and it resulted in 
the awarding of a Nobel Prize in Physiology or Medicine 
to Har Gobind Khorana and Marshall Nirenberg in 1968. 

Once it had been established that the genetic code con- 
sists of triplets, researchers sprang to the task of establish- 
ing which triplets are associated with each amino acid in 
the process of translation. Nirenberg and Johann Heinrich 
Matthaei performed a simple experiment in 1961 that laid 
the groundwork for later experiments in deciphering the 
genetic code. Their experimental design was straightfor- 
ward: Construct synthetic strings of repeating nucleotides, 
and use an in vitro translation system to translate the 
sequence into a polypeptide. For example, Nirenberg and 
Matthaei synthesized an artificial mRNA containing only 
uracils, known as a poly(U). They devised an in vitro trans- 
lation system composed of the known cellular components 
of bacterial translation—ribosomes, charged transfer RNA 
molecules, and essential translational proteins. Regardless 
of where translation might begin along the poly(U) mRNA, 
the only possible codon it contained was UUU. The re- 
searchers were therefore hoping to determine which amino 
acid corresponds to the UUU codon. 

Twenty separate in vitro translations of poly(U) 
mRNA were carried out, each time using a pool of 19 
unlabeled amino acids and one amino acid labeled with 


radioactive carbon (C14). To determine which amino acid 
is encoded by poly(U) mRNA, Nirenberg and Matthaei 
used a different radioactive amino acid in each transla- 
tion. They detected production of a highly radioactive 
polypeptide after conducting translation in a system con- 
taining radioactively labeled phenylalanine (Figure 9.18). 
The radioactive polypeptide was poly-phenylalanine 
(poly-Phe). Since the only possible triplet codon in the 
mRNA was UUU, Nirenberg and Matthaei reasoned that 
5'-UUU-3' codes for phenylalanine. They went on to 
construct poly(A), poly(C), and poly(G) synthetic mRNAs 
and identified 5’-AAA-3'’ as a codon for lysine (Lys), 
5'-CCC-3’ as a proline (Pro) codon, and 5’-GGG-3' as a 
codon for glycine (Gly) (Table 9.8). 

Khorana adapted the experimental strategy of 
Nirenberg and Matthaei to synthesize mRNA molecules 
that contained di-, tri-, and tetranucleotide repeats. His 
construction of repeat-sequence mRNAs allowed him to 
define many additional codons (see Table 9.8). For ex- 
ample, Khorana used the dinucleotide repeat UC to form a 
synthetic mRNA with the sequence 


5'-UCUCUCUCUCUCUCUCUC- 3’ 


This mRNA can be translated in either a reading frame 
that begins with uracil or a reading frame that begins 
with cytosine. In both cases, the reading frame produces 


(a) In vitro translation of synthetic mRNA 


Synthetic poly(U) mRNA 


5’ //UUUUUUUUUUUUUUUUUUUUU// 3 


| 


In vitro translation 
system containing 
“C-labeled amino 
acids. 


N / Dhe (Phe Phe [Phe (Phe Phe Phe Phe c 


Analyze radioactive 
polypeptides. 


(b) Incorporation of “C-labeled phenylalanine into polypeptides 


Radioactivity 


Synthetic mRNA (counts/min) 
None 44 
Poly(U) 39,800 
Poly(A) 50 
Poly(C) 38 


Figure 9.18 Use of synthetic mRNAs to determine genetic 
code possibilities. (a) Synthetic poly(U) mRNA is translated 

in vitro in the presence of individual "C-labeled amino acids. 

A polypeptide consisting of phenylalanine is formed. (b) These 
radioactivity counts demonstrate that only poly(U) synthetic 
mRNA incorporates radioactive phenylalanine into a polypeptide. 
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9.5 Experiments Deciphered the Genetic Code 


Table 9.8 Example Polypeptide Production from Synthetic mRNAs 
Synthetic mRNA mRNA Sequence 
Repeating nucleotides Poly-U UUUU... 

Poly-c CCCC... 


Repeating dinucleotides Poly-uC UCUC... 


Poly-AG AGAG... 


Repeating trinucleotides Poly-uUC UUCUUCUUC... 


Poly-AAG AAGAAGAAG... 


Repeating tetranucleotides Poly-UAUC UAUCUAUC... 


Poly-GUAA GUAAGUAA... 
Note: Data adapted from Khorana (1967). 


alternating UCU-CUC codons. Khorana identified the 
amino acids of the resulting polypeptide and found it con- 
tained alternating serine (Ser) and leucine (Leu). 

When Khorana used mRNA containing trinucleotide 
repeats, most of these mRNAs produced three differ- 
ent polypeptides that each consisted of only one kind 
of amino acid. For example, the reading frame for poly- 
uuc can begin with either of the uracils or with cytosine. 
Messenger RNA is read as consecutive UUC codons if the 
first uracil initiates the reading frame, as UCU if the second 
uracil begins the reading frame, or as CUU if cytosine is 
at the start of the reading frame. Although the different 
reading frames each produced a polypeptide containing 
one amino acid, Khorana was again unsure which codon 
specified which amino acid. 

Nirenberg and Philip Leder contributed the final 
piece of the genetic code puzzle in 1964 when they devised 
an experiment to resolve the ambiguities of codon identity 
remaining from Khorana’s experiments. They synthesized 
many different mini-mRNAs that were each just three 
nucleotides in length (Figure 9.19). The tiny mRNAs were 
added individually to in vitro translation systems contain- 
ing ribosomes, along with 19 unlabeled amino acids and 1 
4C-labeled amino acid, all attached to different transfer 
RNA molecules. The mRNA formed a complex with the 
ribosome and the tRNA charged with the correspond- 
ing amino acid. Each in vitro mixture was then poured 
through a filter that captured the large ribosome—mRNA-— 
tRNA complexes but permitted noncomplexed molecules 
of mRNA or tRNA to pass through. The filter was subse- 
quently tested to determine if the three-nucleotide mRNA 
sequence bound a transfer RNA with the radioactive 
amino acid. Nirenberg and Leder tested all 64 combina- 
tions of nucleotides with their tiny mRNA system and 
were able to identify codon—amino acid correspondences 
for the entire genetic code. In addition, they identified the 


Polypeptides Synthesized Observation 

Phe- Phe- Phe... Polypeptides have one 
amino acid. 

Pro- Pro- Pro 


Ser-Leu-Ser-Leu Polypeptides have two 


alternating amino acids. 
Arg-Glu-Arg-Glu 


Phe-Phe...and Ser-Ser...and 
Leu-Leu... 


Three polypeptides have 
one amino acid each. 


Lys-Lys...and Arg-Arg... 
and Glu-Glu 


Tyr-Leu-Ser-lle-Tyr-Leu-Ser-lle Some polypeptides have 


four repeating amino acids. 
Others identify stop codons. 


None (UAA stop codon) 


nucleotide composition of the three stop codons, UAA, 
UAG, and (Use Genetic Analysis 9.3 to test your skill at in- 
terpreting the genetic code). 


The (Almost) Universal Genetic Code 


In astonishing testimony to a single origin of life on Earth 
and to the power of evolution to maintain virtually com- 
plete uniformity over hundreds of millions of years, every 
living organism uses the same genetic code to synthesize 
polypeptides. In all living things, from bacteria to hu- 
mans, the hereditary script carried by any given mRNA is 
translated by a similar mechanism and produces the same 
polypeptide. The universality of the genetic code makes 
it possible to use bacterial systems to express biologically 
important protein products found in plants or animals. 
The production of human insulin to treat diabetes and of 
factor VIII protein to treat hemophilia are two of numer- 
ous examples of recombinant human gene cloning that are 
possible in part because bacteria and humans use the same 
genetic code for translation. 

As with most general rules, however, there are a 
few exceptions to the universality of the genetic code; 
thus, biologists characterize the genetic code as almost 
universal. The exceptions are found principally in mito- 
chondria, which are specially adapted to life within plant 
and animal cells, but two exceptions occur in free-living 
organisms as well (Table 9.9). The near universality of 
the genetic code presents two important evolutionary 
questions. First, why has the genetic code remained es- 
sentially unchanged in living organisms; and second, 
why have changes evolved mostly in mitochondria? 
The answer to the first question is that natural selec- 
tion pressure against codon change is intense. A single 
codon change would dramatically alter the composi- 
tion of almost every polypeptide an organism produces. 
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@ Mix components 


— tiitt 
ddtdd 


5 (UI 3’ + -f 
Specific mini Ribosomes 19 unlabeled amino 1 C-labeled 
synthetic acids attached amino acid 

mRNAs to tRNAs attached to 
tRNA 
(2) Pass mixture through filter membrane. 
Test filter and solution for radioactivity. 
Specific mRNAs are 


bound by the ribosomes, 
which in turn are trapped 
by the filter; nonspecific 
tRNAs, not bound by 
ribosomes, pass through 
the filter. 


Filter membrane 


á e 4 
l į > 
G U C mRNA binds 
the amino acid valine. 


Radioactivity is in 
the filter. 


G U C mRNA does not 
bind the amino acid 
serine. Radioactivity is 
in the solution. 


“c į “g f 


5’ (GUC) 3° 5’ (GUC) 3° 


Figure 9.19 Deciphering the genetic code with synthetic 
mini mRNAs. For the synthetic mini mRNA GUC, a '4C-labeled 
serine tRNA does not hybridize within the ribosome to form a 
complex, and radioactivity is located in the pass-through solution. 
4C_labeled valine tRNA does hybridize to the GUC mini mRNA 
within the ribosome. The mRNA-ribosome-tRNA complex is 
caught by the filter membrane, where radioactivity is detected. 


Countless evolutionary examples tell us that nearly all of 
the changes that occur would be deleterious, and many 
would be lethal. Simply stated, a change in the genetic 
code would alter the rules of the game of life, and natural 
selection prevents such changes. 

The answer to the second question is that natural 
selection appears to be less intensive on the mitochon- 
drial genetic code than on the genetic code for nuclear 
genes. The genomes of mitochondria found in plant and 


Table 9.9 Genomes Using Modifications of the 


Universal Genetic Code 


Universal Unusual 

Codon Code Code Genome 

AGA, AGG Arg Stop Mitochondria in 
plants, animals, and 
yeast 

AUA, AUU lle Met Mitochondria in 
plants, animals, and 
yeast 

UGA Stop Trp Mitochondria in 
plants, animals, 
and yeast, and in 
Mycoplasma species 

cun? Leu Thr Mitochondria in 
yeast 

UAA, UAG Stop Gln Green algae, 
protozoa 

UGA Stop Cys Protozoa 


N1 = any third-position nucleotide. 


animal cells are small compared to nuclear genomes, and 
any disruption caused by a change in the mitochondrial 
genetic code is likely to be limited, since the number of 
genes affected is so small. In addition, there are many 
mitochondria per cell, providing “backup copies” of the 
mitochondrial genome. If a change in the genetic code se- 
verely disrupts the function of one mitochondrion, others 
are present in the cell to carry out normal activities. 


Transfer RNAs and Genetic Code Specificity 


In our discussion of the genetic code and polypeptide 
assembly at the ribosome, we describe the specific base-pair 
interaction between the anticodon sequence of charged 
tRNA and the codon sequence of mRNA as the key to in- 
corporating the correct amino acid into the polypeptide. 
But how did biologists determine that the specificity of the 
genetic code resides in the tRNA—mRNA interaction and 
not in the recognition of the amino acid carried by tRNA? 

The answer came from a simple and clever experi- 
ment by Francois Chapeville and several colleagues 
in 1962. The researchers began by preparing normal 
cysteine-charged tRNAs. This complex is designated 
Cys-tRNA™S, The researchers then treated Cys-tRNA‘YS 
with the compound Raney nickel that removes an SH 
group from cysteine and converts it to alanine. This treat- 
ment produces Ala-tRNA® in which alanine rather than 
cysteine is attached to tRNA‘. When Chapeville and 
colleagues used Ala-tRNA©S in an in vitro translation 
reaction, the polypeptide contained alanine rather than 
cysteine in amino acid positions that would normally 
carry cysteine. In other words, Ala-tRNA® efficiently 
paired with mRNA codons specifying cysteine and depos- 
ited alanine in the nascent polypeptide, even though the 
mRNA sequence specified cysteine. 


GENETIC ANALYSIS 


The following segment of DNA encodes a polypeptide containing six amino acids. DNA triplets encod- 
ing the start codon (AUG) and a stop codon are included in the sequence. 


strand differs from mRNA by the presence of 


BREAK IT DOWN: The DNA coding 
Tin DNA in place of the U in RNA (p. 270). 


a. Identify the sequence and polarity of the mRNA encoded by this gene. 


C-terminal ends of the polypeptide. 


5'-... CCCAGCCTAGCCTTTGCAAGAGGCCATATCGAC...- 3’ 
3'-... GGGTCGGATCGGAAACGTTCTCCGGTATAGCTG...-5' 


(see inside the front cover or Figure 9.13) is 


b. Determine the amino acid sequence of the polypeptide, and identify the N- rr i IT DOWN: The genetic code 


used for translation (p. 321). 


c. Base-substitution mutation changes the first transcribed G of the template strand to an A. 


A How does this alter the polypeptide? 
on IT DOWN: A base substitution on the template 


DNA strand also requires that the nucleotide on the coding 
strand be changed to the complementary nucleotide (p. 321). 


Solution Strategies Solution steps 


Evaluate 


1. Identify the topic this problem addresses 
and the nature of the requested answer. 


2. Identify the critical information given in 
the problem. 


1. This problem concerns the identification of DNA coding and template 
strands, the protein encoded by DNA, and an evaluation of a mutation 
of the DNA sequence. The answer requires identification of the DNA 
strands, identification of start and stop codons, and determination of the 
amino acid sequence of wild-type and mutant proteins. 

2. DNA sequence that includes a start (AUG) codon and a stop codon is 
given. 


Deduce 


3. Identify the start codon 
by inspecting both DNA 
strands for 3’-TAC-5’ 
that potentially encodes 
a start (AUG) codon 
on the template strand. 

4. Survey the putative tem- 
plate strand identified in 
the previous step 
and determine if DNA trip- 
lets 3'-ATC-5’, 3’-ACT-5’, 
and 3'-ATT-5’ encod- 
ing possible stop codons 
occur as the seventh codon 
of an mRNA sequence. 


TIP: The AUG start codon 
is the most common codon 

for translation initiation and 
is encoded by the DNA triplet 
3°= TAG! 5. 


3'-ACT-5’,and 
3'-ATT-5'. 


TIP: The stop codons UAG, 
UGA, and UAA are encoded by 
DNA triplets 3' -ATC -5', 


3. Scanning both DNA strands in their 3’ to 5’ direction identifies a single 
3'-TAC-5' sequence. The sequence is on the upper strand of the 
sequence beginning with the seventh nucleotide from the right. 


4. Since just one DNA triplet encoding a start codon is present, a scan of the 
strand at the correct distance from the start codon does find a 
3'-ATC-5’ triplet sequence encoding a UAG stop codon: 


5'-CCCAGC CTA GCCTTTGCAAGAGGC CAT ATCGAC- 3’ 


mRNA sequence. Alternatively, arranging RNA nucleotides 


E Substituting U for T on the coding strand produces 


complementary to the template strand and assigning 


Solve 


5. Identify the 
mRNA sequence 
encoding the six 
amino acids of the polypeptide. 

6. List the amino acid sequence of the 
polypeptide. 


TIP: The mRNA sequence can 
be determined from either the 
coding strand or the template 
strand of DNA. 


7. Identify the effect of the G — A base 
substitution on the polypeptide. 


For more practice, see Problems 1, 28, 30, and 31. 


antiparallel polarity produces mRNA. 


Answer a 
5. The mRNA sequence is 


5'-AUG GCC UCU UGC AAA GGC UAG-3' 


Answer b 
6. The polypeptide sequence is 


N-Met-Ala-Ser-Cys-Lys-Gly-C 
Answer c 
7. Substituting the first transcribed G — A alters the second codon of 
mRNA by changing ccc — cuc and substitutes valine (Val) for alanine 
(Ala) in the second position of the polypeptide sequence. 
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Two important conclusions come from this experi- 
ment. First, the genetic code derives its specificity through 
the complementary base-pair interaction of tRNA and 
mRNA. The amino acid carried by charged tRNA does 
not play a role in determining which amino acids are 
incorporated into polypeptides. Rather, tRNA alone— 
acting through the base-pairing interaction of its antico- 
don with the codon of mRNA—gives specificity to the 
genetic code. Second, these findings show the importance 
of the fidelity with which aminoacyl-tRNA synthetases 
correctly recognize their cognate tRNAs and charge them 
with the proper amino acid. 


9.6 Translation Is Followed by 
Polypeptide Folding, Processing, and 
Protein Sorting 


Translation produces polypeptides, but the production of 
functional proteins is not complete until the polypeptides 
are folded into their functional tertiary or quaternary 
structures. Recall from Section 9.1 that these steps involve 
the formation of ionic or covalent bonds, and they may 
also involve specific chemical modifications of amino acids 
in polypeptides. In addition, two other categories of post- 
translational events provide further modifications and sort 
the proteins for transport to their destinations. 


Posttranslational Polypeptide Processing 


The removal of one or more amino acids from a polypep- 
tide is a common form of posttranslational polypeptide 
processing. Earlier in the chapter, we identified AUG as the 
usual start codon and noted that it encodes the modified 
amino acid N-formylmethionine (fMet) in bacterial cells and 
methionine in eukaryotes. Yet {Met is never found in func- 
tional bacterial proteins, and amino acids other than me- 
thionine are frequently the first amino acid of polypeptides 
in eukaryotes. The absence of fMet from functional bacterial 
proteins is the result of posttranslational cleavage of fMet 
from each bacterial polypeptide (Figure 9.20a). Similarly, 
methionine is usually removed as part of posttranslational 
processing in eukaryotes, and the new N-terminal amino 
acid is acetylated as part of the process. 

In addition to N-terminal amino acids, other amino 
acid residues can be chemically modified as well. One of 
the most common modifications of individual amino ac- 
ids is performed by enzymes known as kinases that carry 
out phosphorylation of proteins by adding a phosphate 
group to individual amino acids (Figure 9.20b). This is an 
important regulatory process that can switch a protein 
from an inactive to an active form, or vice versa. Other 
enzymes may add methyl groups, hydroxyl groups, or 
acetyl groups to individual amino acids of polypeptides. 
The addition of carbohydrate side chains to polypeptides 
to form a glycoprotein is another important kind of post- 
translational modification. For example, in one kind of 


(a) Cleavage of N-terminal amino acids 


(b) Chemical modification of internal amino acids 


Kinase 


(c) Polypeptide cleavage 
Preproinsulin 


NE ec 


Pre- Chain B Pro- Chain A 
amino amino 
acids acids 

Cleavage of 

pre-amino acids 

Proinsulin 
Chain A 
Disulfide bonds form 


between A and B chains. 


Cleavage of 
pro-amino acids 


s Chain A 


S s è 
Chain B S) 


Figure 9.20 Examples of posttranslational processing. 


posttranslational modification, the H substance is altered 
by the protein products of the 7^ and J alleles of the ABO 
blood group gene (see Section 4.1). 

Posttranslational processing may also include the 
cleavage of a polypeptide into multiple segments that 
each form functional proteins or that aggregate after 
elimination of one or more segments to form a functional 
protein. Production of the hormone insulin, which fa- 
cilitates transport of glucose into cells, includes two post- 
translational modification steps that remove segments 
of the original polypeptide (Figure 9.20c). The polypep- 
tide product translated from the insulin gene is called 
preproinsulin. It is an inactive protein that contains a 
leader segment, called the pre—amino acid segment, at 
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the N-terminal end and a connecting segment, called the 
pro—amino acid segment, that separates the A-chain seg- 
ment and the B-chain segment, the two functional pieces 
of the polypeptide. During posttranslational processing 
of preproinsulin, the pre—amino acids of the signal se- 
quence are removed, after the polypeptide is transported 
through the cell membrane, to form proinsulin. Three 
disulfide bonds form within and between the A-chain 
and B-chain segments, followed by polypeptide cleavage 
that removes the pro—amino acid segment. What results 
is a functional insulin molecule consisting of 20 amino 
acids in the A-chain segment and 31 amino acids in the 
B-chain segment. 


The Signal Hypothesis 


Like the passengers in a busy airline terminal, the pro- 
teins produced in a cell have different destinations, to 
which they travel with the aid of a “ticket” that tells the 
cell where to transport them. The destination is often 
an organelle or the cell membrane; in certain cases, the 
polypeptide is destined for transport out of the cell. The 


Proteins enter rough ER as they l 
are synthesized by ribosome. 


Ribosome 


in vesicles that then 
are transported to the 
Golgi apparatus. 


K 


Proteins enter secretory vesicles © 


targeted for the cell membrane 
(secreted protein) or for an | 


Golgi apparatus 


| intracellular location. Q = 


Plasma 


membrane ® | Protein secreted from 


the cell. 


Figure 9.21 


ticket that communicates the destination of a polypeptide 
is a signal sequence of 15 to 20 or so amino acids at the 
N-terminal end. 

First articulated in the early 1970s by Gunther Blobel, 
the signal hypothesis proposes that the first 15 to 20 
amino acids of many polypeptides contain an “address 
label” in the form of a signal sequence that designates 
the protein’s destination in the cell. Blobel’s hypothesis 
proposed that the signal sequence directs proteins to the 
endoplasmic reticulum (ER), where they are sorted for 
their cellular destinations. 

Blobel’s signal hypothesis is now a widely accepted 
model for the identification of the cellular destina- 
tions of proteins. In fact, follow-up research has identi- 
fied the mechanism by which proteins are processed 
and packaged for export from a cell. While proteins 
destined to remain in a cell are typically translated at 
“free” ribosomes (ribosomes that float freely in the cy- 
toplasm), large numbers of ribosomes are attached to 
the rough endoplasmic reticulum (rough ER) where pro- 
teins destined for intercellular transport are translated. 
Figure 9.21 illustrates the translation of polypeptides 


-<7 | Signal sequence (shown 
in purple) is synthesized 


- 


by ribosome. 
Signal sequence 
mRNA binds to ER receptor. 
~ à 
Signal 5 O 


sequence 


Cisternal space Polypeptide 

of rough ER 

Polypeptide enters ER following 
signal sequence cleavage. 


Proteins enter the endoplasmic reticulum (ER). Translated proteins enter the cisternal 


space of the ER through receptors that cleave the signal sequences to begin the protein-sorting process. 
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into the cisternal space of the rough ER where the poly- 
peptides are processed and packaged for transport to 
the Golgi apparatus. In the Golgi apparatus additional 


CASES UDY 


Antibiotics and Translation Interference 


We have all taken antibiotics at various times during our 
lives to counteract a painful or persistent microbial infec- 
tion. As a result of the efficiency of these compounds, we 
have experienced rapid relief of symptoms and elimination 
of the infection. These beneficial effects are accomplished 
by selective cell death or through blocking cell prolifera- 
tion. Specifically, the antibiotic kills microorganisms without 
harming our own cells in the process or they act to prevent 
further microbial cell growth. What is the biochemical basis 
of antibiotic action? How do antibiotic compounds specifi- 
cally target microbial cells for destruction? 


PROTEIN SYNTHESIS INHIBITION BY ANTIBIOTIC COM- 
POUNDS You will probably not be surprised to learn that 
different antibiotics target different aspects of microbe 
biology to inhibit. But you may be surprised to learn that 
many different antibiotics target microbial translation 
as their mode of action (Table 9.10). Familiar antibiotics 
such as tetracycline, streptomycin, and chloramphenicol 
target different stages of microbial translation, as do less 


Table 9.10 


Antibiotic Inhibitors of Protein Synthesis 


Antibiotic Inhibitory Action 


Chloramphenicol Blocks polypeptide formation by 
inhibiting peptidyl transferase in 
the 70S ribosome (antibacterial 
action) 


Erythromycin Blocks translation by binding to 50S 
subunit and inhibiting polypeptide 


release (antibacterial action) 


Inhibits translation initiation 
and causes misreading of mRNA 
by binding to the 30S subunit 
(antibacterial action) 


Binds to the 30S subunit and 
inhibits binding of charged tRNAs 
(antibacterial action) 


Streptomycin 


Tetracycline 


Cycloheximide Blocks polypeptide formation by 
inhibiting peptidyl transferase 
activity in the 80S ribosome 


(antieukaryote action) 


Puromycin Causes premature termination of 
translation by acting as an analog 
of charged tRNA (antibacterial and 


antieukaryote action) 


protein processing takes place and the proteins are 
packaged into vesicles for transport to the intercellular 
destinations. 


familiar antibiotics such as erythromycin, puromycin, and 
cycloheximide. Each antibiotic contains a different active 
compound that takes advantage of unique features of 
bacterial translation to disrupt the production of bacterial 
proteins while not interfering with the translation of pro- 
teins in our cells. 


TRANSLATION DISRUPTION BY AMINOGLYCOSIDES 
Streptomycin is one of several antibiotics in a class of bio- 
chemical compounds called aminoglycosides. Streptomycin 
inhibits bacterial translation by interfering with binding of 
N-formylmethionine tRNA to the ribosome, thus prevent- 
ing the initiation of translation. Streptomycin can also cause 
misreading of mRNA during translation by generating 
mispairing between codons and anticodons. For example, 
the codon uuu normally specifies phenylalanine, but strep- 
tomycin induces pairing between a uuu codon and the tRNA 
carrying isoleucine, whose codon is AUU. This error leads to 
amino acid changes in proteins and potentially to defective 
protein activity. Other aminoglycosides, such as neomycin, 
kanamycin, and gentamycin, also cause mispairing between 
codons and anticodons and can generate defective proteins. 
Erythromycin also impairs bacterial translation, but it does 
so in a very different way. It binds to the 50S (large) subunit 
in the tunnel from which the newly synthesized polypep- 
tide emerges. In this manner, erythromycin blocks the pas- 
sage of the polypeptide out of the ribosome. This causes the 
ribosome to stall on mRNA, bringing translation to a halt. 
Table 9.10 provides details about these and other actions of 
antibacterial agents. 


TRANSLATION BLOCKAGE BY ANTIFUNGAL COM- 
POUNDS Single-celled eukaryotic microorganisms, such as 
fungi, can also cause human infections. To fight these infec- 
tions, antibiotics such as puromycin and cycloheximide that 
target translational activities of eukaryotic cells are used. 
Puromycin has a three-dimensional structure similar to that 
of the 3’ end of a charged tRNA. It stops translation of bacte- 
rial and eukaryotic mRNAs by binding at the ribosomal A site 
and acting as an analog of charged tRNA. When puromycin 
is bound at the A site, its amino group forms a peptide bond 
with the carboxyl group of the P-site amino acid. However, 
puromycin does not contain a carboxyl group. This differ- 
ence prevents formation of any additional peptide bonds 
and puts an end to translation. Cycloheximide exclusively 
blocks eukaryotic translation by binding to the 60S subunit 
and inhibiting peptidyl transferase activity, much like chlor- 
amphenicol does to bacterial peptidyl transferase. 


SUMMARY 


9.1 Polypeptides Are Composed of Amino Acid 
Chains That Are Assembled at Ribosomes 


| Polypeptides contain 20 kinds of amino acids that carry side 
chains, giving them specific properties. 

f Translation takes place at the ribosome, where mRNA 

codons are coupled to transfer RNA anticodons by comple- 

mentary base pairing. 


= 


Polypeptides have four structural levels: the amino acid 
order (primary), intrachain folding (secondary), three- 
dimensional functional folding (tertiary), and multimeric 
protein structure (quaternary). 


| Polypeptides have an N-terminal (amino) end anda 
C-terminal (carboxyl) end. 


f Ribosomes are composed of two subunits that each consist 
of ribosomal RNA and numerous proteins. 


E Ribosomes have three functional sites of action: the P site, 
where the polypeptide is held; the A site, where tRNA 
molecules bind to add their amino acid to the end of the 
polypeptide; and the E site, which provides an exit point for 
uncharged tRNAs. 


9.2 Translation Occurs in Three Phases 


| Bacterial translation is initiated with the binding of the 
Shine-Dalgarno sequence on the 5’ mRNA end to a comple- 
mentary sequence of nucleotides on the 3’ end of the 16S 
rRNA in the small ribosomal subunit. The nearby start 
codon is the site where translation commences. 

E In eukaryotic mRNA, the 5’ cap is the binding site for eu- 
karyotic initiation factors that cause the small ribosomal 
subunit to begin scanning in search of the start codon, which 
is part of the Kozak sequence. 

E Archaea carry multiple translation-initiation factors that are 
homologous to eukaryotic initiation factors, but they also 
produce a high proportion of leaderless mRNAs that have an 
unknown translation-initiation mechanism. 

E During polypeptide synthesis, charged tRNAs enter the A 
site, and peptidyl transferase catalyzes peptide bond forma- 
tion, transferring the polypeptide from the A-site tRNA to the 
P-site tRNA. Elongation factor proteins translocate the ribo- 
some, shifting the tRNA—polypeptide complex from the A site 
to the P site and opening the A site for the next charged tRNA. 

f Translation terminates when a stop codon enters the A site. 
Release factor proteins, rather than tRNA, bind to stop co- 
dons. Release factors cause release of the polypeptide and 
lead to the dissociation of the ribosome from mRNA. 


9.3 Translation Is Fast and Efficient 


E An mRNA undergoes simultaneous translation by sev- 
eral ribosomes that attach to it sequentially to form a 
polyribosome. 
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E Usually, a ribosome will dissociate from mRNA upon en- 
countering a stop codon, but the small size of some intercis- 
tronic spacers in bacterial polycistronic mRNAs permits a 
ribosome to translate two or more polypeptides sequentially 
from the mRNA before dissociating. 

! The evolutionary evidence derived from homologies among 
translationally active proteins of members of the three do- 
mains of life suggests that archaea are more closely related to 
eukaryotes than they are to bacteria. 


9.4 The Genetic Code Translates Messenger 
RNA into Polypeptide 


| The genetic code is redundant, meaning that most amino 
acids are specified by more than one codon. Redundancy of 
the genetic code is made possible by third-base wobble that 
relaxes the strict complementary base-pairing requirements 
at the third base of the codon. 

E Specialized enzymes called aminoacyl-tRNA synthe- 
tases catalyze the addition of a specific amino acid to 
each tRNA. 


9.5 Experiments Deciphered the Genetic Code 


E Invitro experimental analysis demonstrates that the genetic 
code is triplet and does not contain gaps or overlaps. 

I Each mRNA codon is composed of three consecutive nucle- 
otides. Of the 64 codons contained in the genetic code, 61 
specify amino acids and 3 are stop codons. 

The genetic code was deciphered by analysis of in vitro 
translation of synthetic messenger RNA. 

The genetic code is essentially universal among living or- 
ganisms. The few exceptions to the genetic code are found 
mainly in mitochondria. 

Properly charged tRNAs play the central role in converting 
mRNA sequence into polypeptide sequence. 


9.6 Translation Is Followed by Polypeptide 
Folding, Processing, and Protein Sorting 


| Formation of functional proteins occurs after translation is 
completed and may be aided by ribosome-associated pro- 
teins or by separate protein complexes. 
Proteins in eukaryotic cells are sorted to their cellular desti- 
nations by signal sequences at their N-terminal ends. Signal 
sequences are removed from polypeptides in the ER, and 
polypeptides destined for different sites in the cell are differ- 
entially glycosylated before being packaged for transport to 
the Golgi apparatus. 

I In the Golgi apparatus, polypeptides are packaged 
into transport vesicles for shipment to their cellular 
destinations. 
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PROBLEMS ( MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 
For answers to selected even-numbere roolers, see Appendix: Answers. 
Chapter Concepts lected bered probl ppendix: A 
1. Some proteins are composed of two or more polypeptides. a. Identify the sequence and polarity of the mRNA 
Suppose the DNA template strand sequence transcribed from this fragmentary template strand 
3'- TACGTAGGCTAACGGAGTAAGCTAACT- 5’ produces a poly- sequence. 
peptide that joins in pairs to form a functional protein. b. Determine the amino acid sequence encoded by this 
a. What is the amino acid sequence of the polypeptide fragment. Identify the N- and C-terminal directions of 
produced from this sequence? the polypeptide. 
b. What term is used to identify a functional protein like c. Which is the third amino acid added to the polypeptide 
this one formed when two identical polypeptides join chain? 
together? 6. Describe three features of tRNA molecules that lead to 
2. Inthe experiments that deciphered the genetic code, many their correct charging by tRNA synthetase enzymes. 
different synthetic mRNA sequences were tested. 7. Identify the amino acid carried by tRNAs with the follow- 
a. Describe how the codon for phenylalanine was identified. ing anticodon sequences. 
b. What was the result of studies of synthetic mRNAs a. 5'-UAG-3' 
composed exclusively of cytosine? b. 5’-AAA-3! 
c. What result was obtained for synthetic mRNAs c€ 5/-cuc-3" 
containing AG repeats, that is, d. 5’-AUG-3’ 
AGAGAGAG...? e. 5'-GAU-3' 
d. Predict the results of experiments examining GCUA 8. For each of the anticodon sequences given in the previous 
repeats. problem, identify the other codon sequence to which it 
3. Several lines of experimental evidence pointed to a triplet could potentially a eee wobble: 
genetic code. Identify three pieces of information that sup- 9. What is the role of codons UAA, UGA, and UAG in transla- 
ported the triplet hypothesis of genetic code structure. tion? What events occur when one of these codons appears 
: : i 
4. Outline the events that occur during initiation of transla- atthe Aisiteof the ribösome; 
tion in E. coli. 10. Compare and contrast the composition and structure of 
5. A portion of a DNA template strand has the base sequence bacterial and eukaryotic ribosomes, identifying at least 


primary structure (p. 308) 


5'-..ACGCGATGCGTGATGTATAGAGCT...-3' 


three features that are the same and three features that are 
unique to each type of ribosome. 


11. Consider translation of the following mRNA sequence: 
5'-.. AUGCAGAUCCAUGCCUAUUGA...- 3’ 


a. Diagram translation at the moment the fourth amino 
acid is added to the polypeptide chain. Show the ribo- 
some; label its A, P, and E sites; show its direction of 
movement; and indicate the position and anticodon 
triplet sequence of tRNAs that are currently interacting 
with mRNA codons. 

b. What is the anticodon triplet sequence of the next 
tRNA to interact with mRNA? 

c. What events occur to permit the next tRNA to interact 
with mRNA? 


12. The diagram of a eukaryotic ribosome shown below con- 
tains several errors. 


Ribosome movement 
along mRNA 


3 


a. Examine the diagram carefully, and identify each error. 
b. Redraw the diagram, and correct each error using the 
mRNA sequence shown. 


13. Third-base wobble allows some tRNAs to recognize 
more than one mRNA codon. Based on this chapter’s 
discussion of wobble, what is the minimal number of 
tRNA molecules necessary to recognize the following 
amino acids? 

a. leucine 

b. arginine 
c. isoleucine 
d. lysine 


14. The genetic code contains 61 codons to specify the 20 
common amino acids. Many organisms carry fewer than 
61 different tRNA genes in their genomes. These genomes 
take advantage of isoaccepting tRNAs and the rules gov- 
erning third-base wobble to encode fewer than 61 tRNA 
genes. Use these rules to calculate the minimal number 
of tRNA genes required to specify all 20 of the common 
amino acids. 


15. The three major forms of RNA (mRNA, tRNA, and rRNA) 

interact during translation. 

a. Describe the role each form of RNA performs during 
translation. 

b. Which of the three types of RNA might you expect to 
be the least stable? Why? 

c. Which form of RNA is least stable in eukaryotes? Why 
is this form least stable? 
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d. Compared to the average stability of mRNA in E. coli, 
is mRNA in a typical human cell more stable or less 
stable? Why? 


16. The figure below contains sufficient information to fill in 
every row. Use the information provided to complete the 
figure. 


DNA 


Coding 5’ Tl JA | | 
Template zz A 


mRNA codon 
5 T a a a a omy fi 3° 
tRNA anticodon 


; F N 1 5 


Amino acid 


3-etter TET) 
lever T 


17. The line below represents a mature eukaryotic mRNA. 
The accompanying list contains many sequences or 
structures that are part of eukaryotic mRNA. A few of 
the items in the list, however, are not found in eukaryotic 
mRNA. As accurately as you can, show the location, on 
the line, of the sequences or structures that belong on 
eukaryotic mRNA; then, separately, list the items that are 
not part of eukaryotic mRNA. 


5! ar 


stop codon 

poly-A tail 

intron 

3’ UTR 

promoter 

start codon 

AAUAAA 

5' UTR 

5' cap 

termination sequence 


Pre he ao op 


18. After completing Problem 17, carefully draw a line 
below the mRNA to represent its polypeptide prod- 
uct in accurate alignment with the mRNA. Label the 
N-terminal and C-terminal ends of the polypeptide. 
Carefully draw two lines above and parallel to the 
mRNA, and label them “coding strand” and “template 
strand.” Locate the DNA promoter sequence. Identify 
the locations of the +1 nucleotide and of a transcription 
termination sequence. 


19. Define and describe the differences in the primary, second- 
ary and tertiary structures of a protein. 


20. Describe the roles and relationships between 
a. tRNA synthetases and tRNA molecules. 
b. tRNA anticodon sequences and mRNA codon 
sequences. 
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21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


In an experiment to decipher the genetic code, a poly-Ac 
mRNA (ACACACAC...) is synthesized. What pattern of amino 
acids would appear if this sequence were to be 

translated by a mechanism that reads the genetic 

code as 


a. a doublet without overlaps? 

b. a doublet with overlaps? 

c. a triplet without overlaps? 

d. a triplet with overlaps? 

e. a quadruplet without overlaps? 
f. a quadruplet with overlaps? 


Identify and describe the steps that lead to the secretion of 
proteins from eukaryotic cells. 


The amino acid sequence of a portion of a polypeptide is 
N..Cys-Pro-Ala-Met-Gly-His-Lys..c. 


a. What is the mRNA sequence encoding this polypeptide 
fragment? Use N to represent any nucleotide, Pu to rep- 
resent a purine, and Py to represent a pyrimidine. Label 
the 5’ and 3’ ends of the mRNA. 

b. Give the DNA template and coding strand sequences 
corresponding to the mRNA. Use the N, Pu, and Py 
symbols as placeholders. 


Har Gobind Khorana and his colleagues performed 

numerous experiments translating synthetic mRNAs. 

In one experiment, an mRNA molecule with a repeat- 

ing UG dinucleotide sequence was assembled and 

translated. 

a. Write the sequence of this mRNA and give its polarity. 

b. What is the sequence of the resulting polypeptide? 

c. How did the polypeptide composition help confirm the 
triplet nature of the genetic code? 

d. Ifthe genetic code were a doublet code instead of a 
triplet code, how would the result of this experiment be 
different? 

e. Ifthe genetic code was overlapping rather than non- 
overlapping, how would the result of this experiment be 
different? 


An experiment by Khorana and his colleagues translated 

a synthetic mRNA containing repeats of the trinucelotide 

UUG. 

a. How many reading frames are possible in this mRNA? 

b. What is the result obtained from each reading frame? 

c. How does the result of this experiment help confirm the 
triplet nature of the genetic code? 


The human £-globin polypeptide contains 146 amino ac- 
ids. How many mRNA nucleotides are required to encode 
this polypeptide? 


The mature mRNA transcribed from the human -globin 
gene is considerably longer than the sequence needed to 
encode the 146—amino acid polypeptide. Give the names of 
three sequences located on the mature B-globin mRNA but 
not translated. 


Figure 9.7 contains several examples of the Shine- 
Dalgarno sequence. Using the seven Shine-Dalgarno 


For answers to selected even-numbered problems, see Appendix: Answers. 


29. 


30. 


31. 


32. 


sequences from E. coli, determine the consensus sequence 
and identify its location relative to the start codon. 


Figure 9.20 shows three posttranslational steps required 
to produce the sugar-regulating hormone insulin from the 
starting polypeptide product preproinsulin. 


a. 


A research scientist is interested in producing human 
insulin in the bacterial species E. coli. Will the genetic 
code allow the production of human proteins from bac- 
terial cells? Explain why or why not. 

Explain why it is not feasible to insert the entire human 
insulin gene into E. coli and anticipate the production of 
insulin. 

Recombinant human insulin (made by inserting human 
DNA encoding insulin into E. coli) is one of the most 
widely used recombinant pharmaceutical products in 
the world. What segments of the human insulin gene 
are used to create recombinant bacteria that produce 
human insulin? 


A DNA sequence encoding a five—amino acid polypeptide 
is given below. 


.. ACGGCAAGATCCCACCCTAATCAGACCGTACCATTCACCTCCT... 
.. TGCCGTTCTAGGGTGGGATTAGTCTGGCATGGTAAGTGGAGGA... 


a. 


Locate the sequence encoding the five amino acids of 
the polypeptide, and identify the template and coding 
strands of DNA. 

Give the sequence and polarity of the mRNA encoding 
the polypeptide. 

Give the polypeptide sequence, and identify the 
N-terminus and C-terminus. 

Assuming the sequence above is a bacterial gene, 
identify the region encoding the Shine-Dalgarno 
sequence. 

What is the function of the Shine-Dalgarno 
sequence? 


A portion of the coding strand of DNA for a gene has the 
sequence 


5'-...GGAGAGAATGAATCT...- 3 


a. 


Write out the template DNA strand sequence and po- 
larity as well as the mRNA sequence and polarity for 
this gene segment. 

Assuming the mRNA is in the correct reading frame, 
write the amino acid sequence of the polypeptide using 
three-letter abbreviations and, separately, the amino 
acid sequence using one-letter abbreviations. 


A eukaryotic mRNA has the following sequence. The 5’ 
cap is indicated in italics (CAP), and the 3’ poly(A) tail is 
indicated by italicized adenines. 


5! 


- CAPCCAAGCGUUACAUGUAUGGAGAGAAUGAAACUG - 


AGGCUUGCCACGUUUGUUAAGCACCUAUGCUACCGAAAAAAA 
AAAAAAAAAAAAAAAAA- 3! 


a. 


b. 


Locate the start codon and stop codon in this sequence. 
Determine the amino acid sequence of the polypeptide 
produced from this mRNA. Write the sequence 

using the three-letter and one-letter abbreviations for 
amino acids. 


33. Diagram a eukaryotic gene containing three exons and two 34. 
introns, the pre-mRNA and mature mRNA transcript of the 
gene, and a partial polypeptide that contains the following 
sequences and features. Carefully align the nucleic acids, and 
locate each sequence or feature on the appropriate molecule. 
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The following table contains DNA-sequence information 
compiled by Marilyn Kozak (1987). The data consist of 
the percentage of A, C, G, and T at each position among 
the 12 nucleotides preceding the start codon in 699 
genes from various vertebrate species, and as the first 
nucleotide after the start codon. The start codon occu- 


a. the AG and Gu dinucleotides corresponding to intron— 
exon junctions pies positions +1 to +3, and the +4 nucleotide occurs 
b. the +1 nucleotide immediately after the start codon. Use the data to 
c. the 5’ UTR and the 3’ UTR determine the consensus sequence for the 13 nucleo- 
d. the start codon sequence tides (-12 to —1 and +4) surrounding the start codon in 
e. astop codon sequence vertebrate genes. 
f. acodon sequence for the amino acids Gly-His-Arg at 
the end of exon 1 and a codon sequence for the amino 
acids Leu-Trp-Ala at the beginning of exon 2 
Position 12 11 10 9 8 7 6 5 4 3 2 1 [start] +4 
Percent A 23 26 25 23 19 23 17 18 25 61 27 15 [AUG] 23 
Percent C 35 35 35 26 39 37 19 39 53 2 49 55 [AUG] 16 
Percent G 23 21 22 33 23 20 44 23 ip 36 13 21 [AUG] 46 
Percent T 19 18 18 18 19 20 20 20 7 1 11 9 [Auc] 15 


35. The following table lists a-globin and B-globin gene 
sequences for the 12 nucleotides preceding the start codon 
and the first nucleotide following the start codon. The 
data are for 16 vertebrate globin genes reported by Kozak 
(1987). The sequences are written from —12 to +4 with the 
start codon sequence in capital letters. 


a-Globin Family 


Gene Sequence 


=ll2 start +4 


36. 


Human adult agagaacccaccATGg 

Human embryonic caccctgecgecATGt 

Baboon ccagcgcgggcATGg 37. 
Mouse adult caggaagaaaccATGg 

Rabbit adult gaaggaaccaccATGg 

Goat embryonic tceagetgecaccATGt 

Duck adult ggagctgcaaccATGg 38. 
Chicken embryonic etctectgcacaATGg 

B-Globin Family 

Human fetal agtccagacgccATGg 

Human embryonic aggcctggcatcATGg 

Rabbit adult aaacagacagaATGg 

Rabbit embryonic agaccagacatcATGg 

Chicken adult ecaaccgecgccATGg 

Chicken embryonic cecegcetgecaccATGg 

Xenopus adult tceaactttggccATGg 

Xenopus larval tetacagcecaccATGg 


Use the data in this table to 


a. Determine the consensus sequence for the 16 selected 
a-globin and B-globin genes. 

b. Compare the consensus sequence for these globin genes 
to the consensus sequence derived from the larger study 
of 699 vertebrate genes in Problem 34. 


The six nucleotides preceding the start codon and the 
first nucleotide after the start codon in eukaryotes exhibit 
strong sequence preference as determined by the percent- 
ages of nucleotides in the —6 to —1 positions and the +4 
position. Use the data given in the table for Problem 35 

to determine the seven nucleotides that most commonly 
surround the start in vertebrates. 


In terms of the polycistronic composition of mRNAs and 
the presence or absence of Shine-Dalgarno sequences, 
compare and contrast bacterial, archaeal, and eukaryotic 
mRNAs. 


Organisms of all three domains of life usually use the 
mRNA codon AUG as the start codon. 


a. Do organisms of the three domains use the same amino 
acid as the initial amino acid in translation? Identify 
similarities and differences. 

b. Despite AUG being the most common start codon se- 
quence, very few proteins have methionine as the first 
amino acid. Why is this the case? 


The Integration of Genetic 
Approaches: Understanding 
Sickle Cell Disease 


CHAPTER OUTLINE 


10.1 An Inherited Hemoglobin 
Variant Causes Sickle Cell 
Disease 

10.2 Genetic Variation Can Be 
Detected by Examining DNA, 
RNA, and Proteins 

10.3 Sickle Cell Disease Evolved by 
Natural Selection in Human 
Populations 


Normal red blood cells barely squeeze through narrow capillaries, but 
sickle-shaped red blood cells can block blood flow in capillaries. 


n previous chapters, we described gene transmission and 
function, the structure and function of DNA, the processes 
of gene expression, and the role of evolution in genetics. 
Each of these aspects of modern genetics contributes to the 
broad explanatory power of the science, a power achieved 
specifically through the integration of these principles and 
approaches. This chapter is designed to bring the integra- 
tion of these genetic analysis approaches into focus us- 
ing the human hereditary disorder sickle cell disease as an 
example. 
The chapter has a second purpose as well. In the course 
of illustrating how analyses of hereditary transmission, 
molecular genetic variation, and evolution contribute to a 
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comprehensive understanding of sickle cell disease, 
it also describes gel electrophoresis and related 
experimental methods that are commonly applied 
to the analysis of DNA, RNA, and protein variation. 
These methods are part of the basic “toolkit” of ge- 
netic analysis and can be used to obtain substantial 
information about nucleic acid and protein variation. 


10.1 An Inherited Hemoglobin 
Variant Causes Sickle Cell Disease 


Sickle cell disease (SCD), also known as sickle cell anemia, 
has been intensively investigated for more than a century, 
and its study has generated a revolution in genetics. Not 
only was SCD among the first genetic disorders shown to 
be caused by an inherited defect in a protein molecule, but 
the discovery of its cause—several years before DNA was 
identified as the hereditary molecule—helped pave the way 
for the molecular era in genetics. In fact, sickle cell disease 
has the distinction of being the first hereditary disorder to 
be designated as a “molecular disease.” It demonstrates that 
inherited diseases have a molecular basis, and it played a 
key role in establishing the molecular nature of mutations. 
Investigation of SCD and the description of the molecular 
basis of the disease led ultimately to an explanation of the 
role natural selection plays in the evolution and mainte- 
nance of the disease-causing allele in populations. 

SCD is a potentially fatal autosomal recessive dis- 
order caused by an abnormality in the structure and 
function of hemoglobin (Hb), the main oxygen-carrying 
protein in red blood cells. The hemoglobin defect produc- 
ing SCD shortens the life span of red blood cells from an 
average of about 120 days for normal red blood cells to an 
average of 10 to 20 days for red blood cells in individuals 
with SCD. As a result of the greatly shortened life span 
of red blood cells, individuals with SCD have, severe 
anemia (an abnormally low number of red blood cells) 
that reduces the ability of blood to deliver oxygen to tis- 
sues. Oxygen deprivation causes tissue damage and tissue 
death throughout the body, accompanied by significant 
muscle pain and accumulated damage to organs. 

The hemoglobin variant causing SCD is one of 
hundreds of different variant hemoglobin alleles occur- 
ring in people around the world, and inherited vari- 
ations in hemoglobin are the most common type of 
hereditary abnormality found in humans. Hundreds of 
millions of people carry mutant alleles that alter the 
structure or function of hemoglobin molecules. Most of 
these alleles are rare. But a few, such as the mutant allele 
causing SCD, are common in certain populations. The 
SCD allele is common in multiple populations around the 


Mediterranean region, in the Middle East, and in Africa, 
and the mutant allele has formed and evolved indepen- 
dently in each of these regions. 


The First Patient with Sickle Cell Disease 


Several principles of molecular genetics have their origin 
in the study of hemoglobin and the genes that produce it, 
including the concept of a molecular disease—a designa- 
tion bestowed on SCD by Linus Pauling in 1954. A good 
place to begin our discussion, however, is with an event 
that occurred more than a century ago—December 1904, 
to be precise—when Walter Noel, a 20-year-old man of 
African origin, was admitted to Presbyterian Hospital in 
New York City suffering from severe anemia and debili- 
tating muscle pain. Noel had arrived in New York City a 
year or so earlier from the Caribbean island of Grenada, 
and he had just begun the first year of a dentistry training 
program when he was admitted to the hospital. 

The physician in charge of Noel’s case was an intern 
named Ernest Irons, who was supervised by a more ex- 
perienced physician named James Herrick. Irons drew 
blood from Noel, examined it under a microscope, and 
was shocked to see that many of Noel’s red blood cells 
had a peculiar elongated and sickled shape that contrasted 
starkly with the circular, biconcave shape of normal red 
blood cells (Figure 10.1). 

With intensive treatment of his symptoms, Noel re- 
covered from this initial bout with the illness. Over the 


Figure 10.1 Red blood cell shape. Normal red blood cells 
have a biconcave shape (top), whereas sickle-shaped red blood 
cells are elongated (left). Other partially deformed red blood 
cells are also seen in this image. 
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next two and a half years, he was to be readmitted several 
times and treated for the same symptoms. After complet- 
ing his dentistry training, he returned to Grenada, where 
he practiced dentistry until he died 9 years later at the age 
of 32. In 1910, Herrick published a paper describing Walter 
Noel’s case. The paper was the first clinical description of 
SCD, although the disorder had no name at the time 
Herrick described it. Its original name, “sickle cell anemia,” 
was created several years later by combining sickle, for the 
characteristic deformity of the red blood cells, and anemia, 
for the chronic shortage of red blood cells in most patients. 

During periodic events known as “sickle crises,” sickle 
cell disease patients experience severe muscle pain. The 
pain is due to oxygen deprivation in organs and tissues 
that is brought about by the presence of large numbers of 
sickle-shaped red blood cells in their circulation. As seen 
in Figure 10.1, sickle-shaped red blood cells are longer 
than the normal, biconcave red blood cells, and they are 
large enough to impede blood flow in small blood ves- 
sels and capillaries. These blood vessels and capillaries 
are barely wide enough for normal, biconcave red blood 
cells to move through in single file (see the chapter opener 
photo). The reduced blood flow deprives the surrounding 
tissues of oxygen, causing immediate pain as well as po- 
tential long-term damage to organs and tissues. 

Red blood cells are oxygen transportation and deliv- 
ery specialists. They are pumped from the heart to the 
lungs, where they pick up oxygen, and then through the 
circulatory system to carry oxygen and other molecules 
throughout the body. Red blood cells do not contain 
nuclei and cannot divide; thus they are essentially sacks 
of proteins that tumble through the circulatory systems 
to pick up and deliver their molecular cargo. They cir- 
culate until they are damaged and removed from cir- 
culation—about 100 to 120 days on average for normal 
red blood cells. Red blood cells that undergo sickling 
are damaged more quickly than normal and have a life 
span. Unfortunately, the body’s red blood cell produc- 
tion capacity is limited. The accelerated rate of loss of red 
blood cells in SCD results in chronic anemia as one of the 
symptoms of the disorder. 


Figure 10.2  Globin proteins and their 
genes. The a-globin and B-globin genes 
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Hemoglobin Structure 


Hemoglobin molecules are tetramers, protein structures 
consisting of four proteins joined together. They are an 
example of a protein with a quaternary structure (see 
Table 9.2, p. 308). The hemoglobin tetramer contains two 
protein chains from each of two different globin genes 
that are encoded on separate chromosomes in the hu- 
man genome. Each molecule of the most common form 
of hemoglobin consists of two a-globin (pronounced 
AL-fa GLOBE-in) proteins, produced by the a-globin 
gene, and two B-globin (BAY-ta GLOBE-in) proteins, 
produced by the B-globin gene. This particular composi- 
tion, denoted a» is identified as hemoglobin A, or HbA, 
where Hb is an abbreviation for hemoglobin and A des- 
ignates the most common form. Each of the four globin 
proteins in hemoglobin has a specific tertiary structure, 
and each carries one iron-containing molecule of heme 
that undergoes reversible binding with a molecule of oxy- 
gen. Thus, each globin tetramer can bind and transport 
four oxygen molecules. 

The a-globin and B-globin genes are members of a 
family of closely related globin genes that evolved from 
a common ancestral gene. Due to their common origin, 
a-globin and B-globin genes have similar composition, 
and their protein products have strong structural and 
functional similarities. The organization of the two genes 
is also very similar (Figure 10.2). Both genes contain three 
exons and two introns. The a-globin gene encodes a poly- 
peptide containing 141 amino acids, and the polypeptide 
encoded by the B-globin gene contains 146 amino acids. 


Globin Gene Mutations 


The globin genes may be the most intensively studied 
genes in the human genome, and the existence and 
distribution of a-globin and -globin gene variants are 
well documented in most human populations. At pres- 
ent, nearly 500 different allelic variants of the a-globin 
and B-globin genes are known. Nearly all of these globin 
gene variants are rare. Some are so rare that they exist 


a-globin gene 5’ 7f 


each contain three exons and two introns. 
The amino acids encoded by each exon 
are indicated by the numbers describing 
their places in the final polypeptide chain. 


a-globin polypeptide |__| 


Promoter 


Exon 1 Exon 2 Exon 3 
Promoter | | | 
a E. E E 
\ Intron Intron / 


| | 
| | | 
Amino acids 1-31 32-99 100-141 


Exon 1 Exon2 


B-globin gene 5’ 7f 


\ Intron Intron 27 
\ - 


B-globin polypeptide a 


Amino acids 1-30 31-104 105-146 
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only in a single family. There are a few notable excep- 
tions, however, and these more common variants pro- 
vide well-researched examples of some hereditary and 
evolutionary processes that you are likely to have studied 
in previous biology courses. They also give us the chance 
to explore how globin gene variants affect hemoglobin 
structure and function. 

In a century of research since Herrick’s description of 
Walter Noel’s SCD, physicians and human biologists have 
fully explored the heredity, molecular basis, and evolu- 
tion of the disorder. Today biologists know that SCD is a 
common autosomal recessive hereditary anemia caused 
by a single base-pair substitution in the B-globin gene 
sequence (Figure 10.3). This type of mutation is known 
as a point mutation. The mutant allele, designated g5, 
produces a B-globin protein that contains the amino acid 
valine (Val) in the sixth position of the 146 amino acids 
of the protein. In comparison, the wild-type A“ allele en- 
codes glutamic acid (Glu) at the sixth amino acid position. 
Individuals with SCD carry two f° alleles and do not have 
the £^ allele; this form of hemoglobin is identified as HbS. 
Such individuals have the genotype 66$ and produce 
only mutant 6-globin chains. 

When two mutant f-globin proteins join two 
normal a-globin proteins, the hemoglobin molecules 


(a) 6’ allele 


DNA 
Coding 5° //GTG CAC CTG ACT CCT/GING/GAG AAG // 


3 
Template 3’ //CAC GTG GAC TGA GGA|CMic/cTC TTC // 5’ 


DNA triplet: 1 2 3 4 5 6 7 8 


mRNA 5’ //GUG CAC CUG ACU CCUIGING|GAG AAG// 3’ 


Codon: 1 


Protein —{ VAL | HIS | LEU) THR PRO KACU LYS Y]/ 


Amino acid: 1 2 3 4 5 6 7 8 


2 3 4 5 6 7 8 


(b) 6° allele 
DNA 


Coding 5°//GTG@ CAC CTG ACT CCT/GHG)GAG AAG|/ 3’ 
Template 3’//(CACTGTG GAC TGATGGA|CHC| CTC TTC A 5’ 


DNA triplet: 1 2 3 4 5 6 7 8 


mRNA 5’ //GUG CAC CUG ACU CCU/GIG/GAG AAG// 3 


Codon: 1 


Protein —( VAL | HIS | LEU THR) PRO [ZU GLU LYS Y// 


Amino acid: 1 2 3 4 5 6 7 8 


2 3 4 5 6 7 8 


Figure 10.3 SCD mutation in the DNA sequence of the 
B-globin gene. DNA, mRNA, and amino acid sequences 
spanning the first eight amino acids of (a) the wild-type 8^ 
allele and (b) the 8° allele are shown. A single nucleotide 
polymorphism occurs in DNA triplet 6 (boxed), causing a 
change in the sixth codon of mRNA and a change in the sixth 
amino acid of the polypeptide from Glu to Val. 


formed are structurally abnormal. Glutamic acid (Glu) 
has an electrically charged side chain that allows it to 
interact with other amino acids in ways that valine 
(Val), which has a nonpolar side chain, cannot (see 
Figure 9.1, p. 306). The presence of Val in B-globin 
alters the secondary and tertiary structure of the 8° pro- 
tein so that it forms a hydrophobic cleft not seen in the 
B^ protein (Figure 10.4). When tetrameric hemoglobin 
protein forms, the hydrophobic clefts of B° proteins en- 
able the attachment of hemoglobin molecules in long 
chains. These chains are particularly likely to form 
when oxygen concentration in red blood cells drops. 
Chain formation distorts the shape of affected red blood 
cells, producing their characteristic sickle shape first 
seen by Ernest Irons. This deformation also damages 
red blood cells and shortens their lifespan relative to 
normal red blood cells. 

Individuals who are heterozygous carriers of SCD 
have the genotype 648°. All their hemoglobin tetra- 
mers contain two normal a-globin proteins, but some 
contain two pA proteins, some contain two BS proteins, 
and others contain one of each type of B-globin protein. 
Consequently, a small percentage of the red blood cells 
of heterozygous individuals can acquire a sickle-shaped 
form when oxygen level is low, as it is when red blood 
cells are returning to the heart. This condition shortens 
the average life span of red blood cells in heterozy- 
gotes, but not nearly as severely as in those with SCD. 
Furthermore, since only a small percentage of red blood 
cells are affected in heterozygotes, they do not develop, 
the anemia seen in those with SCD. Heterozygous carri- 
ers are sometimes identified as having “sickle cell trait,” 
while their symptoms are generally mild, severe com- 
plications can occur under circumstances in which the 
availability of oxygen is reduced or the need for oxygen 
is high. Potential health consequences for athletes who 
are heterozygous carriers of sickle cell trait are one area 
of concern. For example, in 2010, following the deaths of 
ten student athletes with sickle cell trait over the previ- 
ous decade, the National Collegiate Athletic Association 
(NCAA) implemented a policy offering student athletes 
the option of being tested for sickle cell trait. 


10.2 Genetic Variation Can Be 
Detected by Examining DNA, RNA, 
and Proteins 


We now turn our attention to widely used molecular 
genetics techniques that have been crucial for analyzing 
the B° and 64 alleles as well as the mRNA and proteins 
that are produced by the alleles. We consider them here 
along with techniques used to identify certain specific 
types of DNA sequence variation. We do this in histori- 
cal context, describing techniques and research results 
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Figure 10.4 Hemoglobin structural change in sickle cell disease. (a) The substitution of valine 
for glutamic acid in the polypeptide product of the B° creates a hydrophobic pocket not present in the 
polypeptide of the 8^ allele. (b) Mutant hemoglobin tetramers aggregate by the hydrophobic regions 
adhering to one another. Long strands of aggregated hemoglobin protein crystalize, leading to red 


blood cell deformation (sickling). 


in the order they occurred in the study of SCD and 
discussing how new information contributed to under- 
standing of the condition at each step. The molecular 
methods discussed in this section are useful in a wide 
range of genetic analyses, although some of the specific 
techniques have been replaced with more modern meth- 
ods. Understanding how the original techniques work 
and how their results are interpreted makes it much eas- 
ier to understand how the modern methods work, the 
data they produce, and how those data are interpreted. 


Gel Electrophoresis 


In 1949, James Neel used transmission genetic analysis to 
demonstrate that SCD is an autosomal recessive disorder. 

Neel examined red blood cells of 42 parents who had 
a child with SCD but who did not have SCD themselves. 
He found that a small proportion of the red blood cells of 
each of the parents tested were sickle shaped. The number 
of sickle-shaped red blood cells was consistent with each 
parent being a heterozygous carrier (B^ 6°) and demon- 
strated that SCD is an autosomal recessive trait. That same 
year, Linus Pauling and his colleagues published the first 
description of the molecular basis of SCD and coined the 
term molecular disease to describe it. They used the term 
to denote a disease caused by a variation in the molecular 
structure of a protein. 


Pauling isolated hemoglobin from people having each 
of the various genotypes (8464, B“B°, and BSB) and used 
the analytical technique of gel electrophoresis to separate 
the hemoglobin molecules of each type. Gel electropho- 
resis separates different protein or nucleic acid molecules 
from one another in an electrical field on the basis of their 
charge, size, and shape (Figure 10.5). A gel support matrix is 
created by molding a liquid inside a form, typically a plastic 
casting tray. A “comb” is placed in the liquid as it is poured 
into the form, to produce “wells,” or depressions, in the gel. 
In the form, the liquid solidifies into a flexible semisolid. 

The wells are small reservoirs into which biologi- 
cal samples, such as proteins or nucleic acid (DNA or 
RNA), are loaded. Usually, multiple wells are employed, 
each marking the origin of migration for one of the 
samples and thus serving as the starting point for one of 
the “lanes” of the gel. After biological samples are loaded 
into the wells, an electrical current is applied to the gel by 
connecting a positive electrode to one end and a negative 
electrode to the other. The samples migrate through the 
matrix of tiny pores and passageways created by the so- 
lidification of the gel. Molecules make their way from the 
origin of migration near the negatively charged end of the 
gel toward the positive charge at the opposite end. 

The materials most commonly used to form electro- 
phoresis gels are agarose, a form of cellulose, and poly- 
acrylamide, a synthetic material made by a polymerization 
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Figure 10.5 Apparatus and procedure for gel electrophoresis. 


reaction between chemical compounds. Neither of these 
substances interacts with proteins or nucleic acids as they 
move through the gel, so the rates of migration of different 
protein or nucleic acid molecules are determined entirely 
by the characteristics of the molecules in each sample. 

In gel electrophoresis, biological molecules that have 
electric charge migrate toward the end having the oppo- 
site charge. Most biological molecules, including DNA, 
RNA, and, at pH 7.0, most proteins, have negative charge 
and migrate toward the positive end. Therefore, the ori- 
gin of migration is usually placed near the negative end. 
Proteins with positive charge migrate toward the negative 
end, so when they are being studied, the origin of migra- 
tion will be placed near the positive end. 

Molecular movement through the electrophoresis gel 
is driven by the flow of electricity. Molecules migrate con- 
tinuously and at a steady rate when electricity flows, and 
they stop moving when current flow is turned off. In elec- 
trophoretic gels, the migratory rate of molecules depends 
on three parameters of molecular structure. Each of these 
parameters individually is important in determining how 
a particular molecule migrates, but they can also interact 
with one another to produce a characteristic migration 
rate for each molecule. The parameters are as follows. 


I Molecular weight—Smaller molecules (i.e., proteins 
with fewer amino acids or nucleic acids with fewer 


nucleotides) migrate more quickly than larger molecules. 
This characteristic is an important determinant of elec- 
trophoretic migration of all biological molecules, and it is 
the main parameter in DNA and RNA migration. 


I Molecular charge—Molecules with greater negative 
charge migrate toward the positive pole more rapidly 
than molecules with less negative charge. Variation 
in molecular charge of proteins is imparted by amino 
acid composition and is an important characteristic 
influencing protein migration. In contrast, nucleic 
acids have negative charge that derives from the 
sugar-phosphate backbone. This negative charge is 
proportionate to mass and thus does not contribute to 
differences in migration rate among nucleic acid mol- 
ecules of different lengths. 


I Molecular shape (molecular conformation)— 
Tightly condensed, globular molecules migrate more 
quickly than linear molecules. Protein migration can 
be strongly influenced by conformation; however, 
when nucleic acids are being compared, the only mi- 
gration differences caused by molecular shape occur in 
comparisons of linear and circular DNA. 


Pauling’s electrophoretic analysis of hemoglobin pro- 
teins purified from red blood cells showed that proteins 
produced by individuals with different B-globin genotypes 
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Figure 10.6 Gel electrophoresis (a) 
of hemoglobin proteins. 
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have different electrophoretic mobility, a term that de- 
scribes either the rate of a molecule’s electrophoretic migra- 
tion or its final position in the gel. In Pauling’s analysis of 
hemoglobin protein, each allele was seen to produce a dif- 
ferent protein with a characteristic electrophoretic mobility; 
in other words, as each type of protein migrated through the 
gel, it formed a separate band that could be visualized by 
staining the gel with protein stain (Figure 10.6a). 

The protein band seen in the B65 lane had lower 
electrophoretic mobility (smaller distance migrated from 
the origin) than the protein band detected in the p*p4 
lane. Only a single band is detected in each of these lanes, 
suggesting that all the protein in the lane is identical. In 
contrast, when an electrophoresis lane contains protein 
from a heterozygous (8485) individual, the protein in 
that lane separates into two bands, each corresponding to 
the electrophoretic mobility of the protein bands in the 
lanes containing protein from a homozygote. The lower 
electrophoretic mobility of BS, versus BÄ, is due to the re- 
placement of glutamic acid (with a charged side chain) by 
valine (with a nonpolar side chain) in the BS, protein. 

Pauling then used a technique called densitometry to 
show that a single kind of B-globin protein is present in 
lanes containing protein from a homozygous individual, 
and that two kinds of protein are present in lanes con- 
taining protein taken from heterozygotes (Figure 10.6b). 
Densitometry quantifies the amount of protein present 
in a gel lane by measuring how much light is blocked 
from passing through the gel by the presence of a band of 
protein. The densitometry curve peaks when light passage 
is obscured by the presence of a band of material in the 
electrophoresis gel. 

The importance of Pauling’s work is twofold. First, it 
introduced laboratory methods for the detection of dis- 
tinct forms of globin protein; and second, it demonstrated 
that hemoglobin variation explains the inheritance of SCD 
as a molecular disease. Pauling’s study was the first to show 
that the inheritance patterns of disorders in pedigrees 
parallel those of the transmission of molecular variation. 
His work also illustrates that among heterozygous carriers, 
molecular evidence often supports the expression of both 


alleles, even if the abnormal morphology characteristic of 
a disorder is present only in individuals who are homozy- 
gous for a recessive allele. In short, Pauling was the first 
to draw attention to a fundamental principle of genetics: 
Hereditary morphologic variation has a molecular basis. 


Hemoglobin Peptide Fingerprint Analysis 


In 1957, Vernon Ingram published a description of the 
molecular basis of SCD based on analysis of the amino 
acid composition of the hemoglobin proteins produced 
by each allele. Ingram examined hemoglobin protein varia- 
tion with a two-step approach called peptide fingerprint 
analysis (Figure 10.7). To prepare for fingerprint analysis, 
the hemoglobin protein is first broken into many fragments 
by chemical treatment. The peptide fragments generated 
contain different segments of the protein, and some peptide 
fragments overlap others. The protein fragments are then 
subjected to electrophoresis to separate the fragments in 
one direction, or dimension, on a gel. Next the hemoglobin 
fragments are separated in a second dimension, perpendic- 
ular to the first, by chromatography, which uses a solvent 
to carry fragments with different amino acid composition 
to different final positions. At the end of these two separa- 
tions, the locations of numerous short peptide fragments 
on the chromatography paper form a pattern of “spots” that 
serve as a kind of “fingerprint” of the protein. Ingram de- 
duced the amino acid sequence of each spot and compared 
the fingerprint pattern of B^ protein to that of BS protein. 
Ingram found that just a single amino acid in the 
hemoglobin of people with SCD (genotype B°8°) was dif- 
ferent in the hemoglobin of people who were homozygous 
for the wild-type allele (genotype B44). In those with 
SCD, the amino acid valine (single-letter abbreviation V) 
substitutes for glutamic acid (single-letter abbreviation E) 
in amino acid position number 6 of the 146 amino acids in 
the B-globin protein chain. As confirmation of his conclu- 
sion, Ingram examined the hemoglobin peptide finger- 
prints for heterozygous carriers of SCD (genotype 648°). 
He found that they had spots corresponding to both the 
glutamic-acid-containing portion of wild-type hemoglobin 
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Figure 10.7 Hemoglobin protein peptide fragment 
analysis. Comparison of hemoglobin protein peptide 
fragments identified the glutamic acid (E) to valine (V) amino 
acid change. Different positions of the two circled peptide 
fragments are due to the amino acid change. 


(the product of the 6^ allele) and the valine-containing 
portion of mutant hemoglobin (the product of the 68° al- 
lele). Genetic Analysis 10.1 guides you through genotype 
identification by protein gel electrophoresis. 


Identification of DNA Sequence Variation 


With the identification of hemoglobin protein structure 
and the amino acid sequences of the a-globin and B-globin 
chains, scientists were ready to combine the analysis of 
hemoglobin variation with analysis of DNA and mRNA 
sequences to explain how nucleic acid variation produces 
SCD. Before we can examine this research, however, some 
additional description of nucleic acid and of protein elec- 
trophoretic analysis is required. This subsection and the 
next present some background information on the identi- 
fication of DNA sequence variability using DNA-digesting 
enzymes and gel electrophoresis. These techniques are 


tools with many applications in DNA analysis. After dis- 
cussing them, we return to the analysis of SCD. 

DNA sequences are linear strings of the nucleotides 
adenine (A), guanine (G), cytosine (C), and thymine (T). 
Scientists compare genome sequences from different 
organisms by aligning them side by side and noting 
the number, location, and type of nucleotide sequence 
differences. Genomic analysis has determined that the 
most common kind of DNA sequence difference between 
organisms of the same species is variation of single nu- 
cleotides, a type of difference called a single nucleotide 
polymorphism (SNP; pronounced snip). SNPs originate 
as point mutations, that is, base-pair substitution muta- 
tions of the type that changed 8“, DNA sequence into 6°, 
sequence. SNPs are prevalent in the genomes of all organ- 
isms. The human genome, for example, contains millions 
of SNPs scattered among the approximately 3 billion 
base pairs (bp) that constitute our genome. By their 
prevalence, SNPs have become an important category 
of genetic marker that can be used for gene mapping 
(see Section 5.5), and they can also be used to identify 
so-called DNA fingerprints that are used for crime scene 
DNA analysis and in paternity testing (see Chapter 22). 
SNPs usually occur in unexpressed regions of genomes 
and have no detectable effect on phenotype. Occasionally, 
however, SNPs occur in expressed regions of genes, where 
the variation can affect the phenotype, as occurs in SCD. 

Whether or not the sequence variation at a SNP locus 
affects a phenotypic character, the allelic sequence is trans- 
mitted from one generation to the next. Figure 10.8 shows 
two DNA sequences representing two SNP alleles that are 
identical except for the highlighted base pairs. An A-T base 
pair is found in allele S;, and a G-C pair specifies allele S3. 
Individual organisms in a population can be homozygous 
(S151 or S252) or heterozygous (S152) for these SNP alleles. 
The pattern of hereditary transmission of SNP alleles fol- 
lows the same pattern as alleles of expressed genes, with 
each parent contributing one allele to offspring. 

The complete sequencing and surveying of a genome 
in search of SNP variation is accomplished by genome 
sequencing techniques (see Section 18.2). For certain ge- 
netic analyses involving SNPs, however, it is not necessary 
to examine complete genome sequences. For these analy- 
ses, SNP variation can be detected using a special class of 
DNA-digesting enzymes that act only on specific DNA se- 
quences. Known as restriction endonucleases—or, more 
commonly, restriction enzymes—these enzymes act like 
precise molecular scissors. Restriction enzymes bind to 
exact DNA nucleotide sequence of a few base pairs, called 
the restriction sequence of the enzyme. Following bind- 
ing, the restriction enzyme cuts each strand of DNA by 
cleaving a precise phosphodiester bond on each strand 
of the molecule. When long DNA molecules containing 
multiple restriction sequences are treated with a restric- 
tion enzyme, many fragments of DNA are produced. The 
number of restriction fragments produced by a given re- 
striction enzyme is characteristic for a given sequence of 
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Figure 10.8 Single nucleotide polymorphism (SNP). (a) At 
a SNP locus, two alleles differ by one base pair. Allele S, contains 
an A-T base pair (green), and allele S contains a c-c base pair 
(purple). (b) Three genotypes result from these two alleles. 


DNA. When SNPs are present, they can alter one or more 
restriction sequences. If this occurs, DNA samples from 
two individuals that are exposed to the same restriction 
enzyme will produce a different number or a difference 
in length (in base pairs) of restriction fragments. These 
inherited DNA sequence variations are called restriction 
fragment length polymorphisms (RFLPs), and they are 
a common consequence of the presence of SNPs. 

Hundreds of different restriction enzymes have 
been identified since they were first discovered in the 
1960s. They are naturally occurring molecules found in 
microorganisms, particularly bacteria. In these organ- 
isms, restriction enzymes act to protect the organism 
from foreign DNA that might invade the cell. Recall from 
Chapter 6 that conjugation, transduction, and trans- 
formation all introduce DNA from one bacterium (the 
donor) into another (the recipient) and that infection 
of bacteria by bacteriophage begins with the transfer 
of phage DNA into the host cell. Restriction enzymes 
are a part of the molecular mechanism that can destroy 
invading foreign DNA. Restriction enzymes share three 
general properties: 


1. Each enzyme exclusively recognizes its own 
restriction sequence, consisting of a precise 
5'-to-3’ nucleotide order on each DNA strand. For 
example, the restriction enzyme EcoRI exclusively 
recognizes the restriction sequence 5'-GAATTC-3’. 
Because the restriction sequence for each restric- 
tion endonuclease is precise, any variation blocks 
the ability of the restriction enzyme to recognize 
the sequence. 


2. Restriction sequences are usually palindromes, 
meaning that each strand of the double-stranded 
restriction sequence has the same nucleotide order 
(running from 5’ to 3’). The double-stranded EcoRI 
restriction sequence is 


5'-GAATTC-3’ 
3'-CTTAAG-5’ 


3. A restriction enzyme cuts each strand of its restric- 
tion sequence in the same way. For example, EcoRI 
cuts each strand of DNA between the G and the 
A of the restriction sequence (Figure 10.9). Some 
restriction enzymes, like EcoRI, cut the DNA strands 
in a staggered, or offset, manner and produce short 
single-stranded ends called sticky ends. Other 
restriction enzymes, such as Smal and Pvull, do not 
generate staggered cuts on the two DNA strands. 
Instead, they cut through both DNA strands at a 
single place, resulting in restriction fragments that 
have blunt ends. 
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Figure 10.9 Restriction digestion by EcoRI. 
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. . r m ak A QA SRS CRC SRC 
and produce a single protein band with an electrophoretic mobility BB’ BB PP 1 2 BB 
that is distinct from either of the other two protein bands. 


Unknown 


© 


The first gel diagram to the right illustrates the electrophoretic mobility 
of hemoglobin protein from individuals with the 6467, 6°B°, and BEBE => =æ 
genotypes. The second gel diagram on the right illustrates bands for two = = 
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produced for heterozygous genotypes if the corresponding homo 
zygous genotypes have proteins with distinct mobilities (p. 344). 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic this problem addresses 1. This problem concerns the interpretation of hemoglobin protein migration 
and the nature of the required answer. in gel electrophoresis. The problem requires identification of genotypes 
based on protein band migration. It also requires prediction of the band 
pattern for a certain genotype. 
2. Identify the critical information given in 2. The problem gives examples of hemoglobin protein migration for three 
the problem. genotypes that are the basis for determining the genotypes of unknown 
samples. 
Deduce 
3. Identify the possible genotypes involving 3. Fora gene with three alleles, three of the possible genotypes are 
alleles 8^, B°, and BS. homozygous (8464, 8°89, and FB9) and three are heterozygous 
(BBS, BABS and B5B9. 
4. Determine the hemoglobin protein band 4. Homozygous genotypes produce one protein band, and heterozygous 
pattern associated with each genotype. genotypes produce two protein bands on an electrophoretic gel: 
B’e" BB BBS BBS BB BBS 
©) 
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__| lanes with bands corresponding to Le — o A — 
lleles of identified genotypes. 
Solve alleles of identified genotypes. Answer a 
5. Identify the genotypes producing the 5. Unknown 1 has one protein band that matches the electrophoretic 
hemoglobin protein band patterns for mobility of 8^ and a second protein band that matches 6°. Unknown 
Unknown 1 and Unknown 2. 1 is B4BS Unknown 2 has protein bands that match 8^ and 6°. 
TIP: Use the bands of identified alleles to predict Unknown 2 is 84°. 
the band pattern for a new genotype. Answer b 
6. Draw the band pattern for an individual 6. The protein band pattern expected for 6°° will have two bands, one for B° 
with the 8°,‘ genotype. and the other for B° 


For more practice, see Problems 4, 5, 15, and 27. Visit the Study Area to access study tools. MasteringGenetics™ 
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Table 10.1 Examples of Restriction Enzymes 


Restriction Source Restriction 
Endonuclease Organism Sequence 
Producers of sticky ends 
V 
EcoRI Escherichia coli 5'-GAATTC-3' 
3'-CTTAAG-5' 
V 
BamHI Bacillus 5'-GGATCC- 3’ 
amyloliquifaciens 3'-CCTAGG-5’ 
N 
Hindlll Haemophilus Dee AAGCTT oh! 
influenzae 3'-TTCGAA-5! 
A 
Ddel Desulfovibrio 5 CTNAG -5 
desulfuricans 3'-GANTC-5' 
Producers of blunt ends 
Pvull Proteus vulgaris Bis CAGCTG ou 
3'-GTCGAC-5' 
Smal Serratia 5’-cccccc-3’ 
marcescens J GGGCCC S5 


Note: N = any nucleotide (A, T, C, or G); Aand V indicate cleavage locations. 


Restriction sequences are listed in Table 10.1, which 
groups them according to whether they produce sticky 
ends or blunt ends. Restriction enzymes have a wide vari- 
ety of uses in laboratory experimentation. Among these is 
their use in the creation of recombinant DNA molecules 
(see Chapters 16 and 17). 

SNP variation is one kind of DNA-sequence change 
that can destroy or create a restriction sequence by 
substituting one DNA base pair for another. Research 
Technique 10.1 illustrates one mechanism for the gen- 
eration of an RFLP. There, allele R? represents a portion 
of a chromosome containing three EcoRI restriction 
sequences, labeled as restriction sequences 1, 2, and 3, 
from left to right. Two DNA restriction fragments are 
generated by EcoRI digestion of this DNA. The size of 
the DNA restriction fragments is measured in kilobases 
(kb), with 1 kb equal to 1000 base pairs of DNA. A frag- 
ment of 5.1 kb (5100 base pairs) is produced by cutting 
DNA at restriction sequences 1 and 2, and a fragment of 
4.2 kb (4200 base pairs) is produced by cutting DNA at 
restriction sequences 2 and 3. Allele R? represents the 
same region of DNA as shown for R? but with a single 
base-pair substitution in restriction sequence 2 that is 
highlighted in red. Notice that restriction sequences 
1 and 3 are the same in both alleles and that the only 
difference between them is the base pair substitution 
in restriction sequence 2. The mutation of restriction 
sequence 2 makes it unrecognizable by EcoRI as the 
sequence is no longer the 5'-GAATTC-3’ sequence used 
by EcoRI. Restriction sequence 2 is destroyed by base- 
pair substitution and no longer exists on chromosomes 
carrying allele R°. Treating DNA containing R? with 
EcoRI will result in digestion at restriction sites 1 and 3, 


but since restriction sequence 2 has been destroyed by 
mutation, DNA is not cut in this region. The result will 
be a single DNA restriction fragment of 9.3 kb, the sum 
of the lengths of the two restriction fragments produced 
from allele R’. Research Technique 10.1 also shows the 
variation in the number and length of DNA restriction 
fragments generated for the three genotypes at this 
RFLP. In this case, a molecular probe (see Research 
Technique 10.2 for details) identifies DNA on both sides 
of the location of restriction site 2. 

The hereditary transmission of these alleles follows 
an autosomal codominant pattern. In the pedigree shown, 
the parents are each heterozygous and their offspring 
could have any of the three potential genotypes. 

Analogous results producing RFLP variation would be 
obtained in cases where base substitution mutation creates 
a new restriction sequence where one did not exist previ- 
ously. This circumstance is illustrated for a globin gene 
mutation in this chapter’s Case Study. RFLP variation can 
also be generated by mutations that insert or delete DNA 
between two existing restriction sequences. In such cases, 
neither of the restriction sequences flanking the insertion 
or deletion is mutated, it is just the number of base pairs 
between the restriction sequence that is changed. We see 
an example of this kind of mutation in Section 13.7, where 
we discuss DNA transposition. Interestingly, the mutation 
discussed there affects one of the genes Gregor Mendel 
studied in his analysis of heredity in pea plants. 


Molecular Probes 


The use of electrophoretic analysis for detecting DNA 
RFLPs, variation in mRNA transcripts from expressed 
genes, or variation in the polypeptide products of genes can 
be straightforward if a small number of different molecules 
are present in the electrophoretic sample. Alternatively, 
analysis can be complicated by the sheer number of restric- 
tion fragments, mRNA molecules, or protein molecules in 
a sample under analysis. Treating human genomic DNA 
with a restriction enzyme like EcoRI, whose restriction se- 
quence is common in the genome, can produce hundreds 
of thousands of restriction fragments. Similarly, isolat- 
ing mRNA molecules or protein molecules from cells 
yields a large number of different products. Without meth- 
ods for identifying specific substances—whether specific 
DNA sequences, mRNA transcripts, or protein products— 
electrophoretic analysis would be hopelessly complex. 
When a small number of different molecules are 
present in an electrophoretic sample of DNA or mRNA, 
a compound called ethidium bromide (EtBr) can be 
used as a chemical tag all the DNA fragments or RNA 
molecules in electrophoresis gels. EtBr attaches to all 
DNA or RNA in a gel by binding to the sugar-phosphate 
backbone. EtBr is not specific to any nucleotide sequence 
and will attach to any DNA or RNA fragment regardless 
of the length or sequence of the fragment. EtBr will be 
concentrated where nucleic acid bands are located, and 
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more molecules of EtBr will attach to larger nucleic acid 
fragments than to smaller nucleic acid fragments. The 
exposure of gels containing EtBr-stained nucleic acids to 
ultraviolet light excites the EtBr and causes it to emit fluo- 
rescent light, so that bands in EtBr-stained DNA or RNA 
gels can be visualized and photographed (Figure 10.10a). 
Molecular weight size markers, DNA fragments of known 
length, that serve as control samples for this gel are in 
lanes 1 and 8 of Figure 10.10a. Experimental samples are 
in lanes 2 through 7. For protein electrophoresis gels, 
general protein stains—stains that bind to any protein— 
can be used to discover the location of each protein run 
through the gel (Figure 10.10b). 

Protein standards, proteins with known electropho- 
retic mobilities that serve as controls for the protein elec- 
trophoresis gel, are in lane 1. Experimental samples are in 
lanes 2 through 5. EtBr staining of nucleic acid gels and 
general protein staining of protein electrophoresis gels 
have many uses, but neither of these methods detects a 
specific nucleic acid sequence or a specific protein. 

Two innovations in gel electrophoresis methods have 
made the identification of specific proteins and the detec- 
tion of specific sequences in mRNAs and DNA fragments 
possible. The first is the development of methods for 
“blotting,” a general name for the transfer of nucleic acids 
or proteins from an electrophoresis gel to a membrane 
that can withstand rigorous treatment and analysis. The 
membrane is most often a durable synthetic material that 
can serve as a permanent record of gel results. Southern 
blotting (named after its inventor, Edwin Southern) is the 
term applied to DNA transfer; northern blotting (named 
by tongue-in-cheek analogy with Southern blotting) iden- 
tifies the transfer of mRNA from a gel to a membrane; 
and western blotting is the term identifying the gel-to- 
membrane transfer of proteins. 

The second innovation is the development of 
molecular probes. These are antibodies, if the target is 
the identification of a specific protein, or single-stranded 
nucleic acids, for the identification of a specific DNA 
or RNA sequence. Molecular probes are essential for 


Figure 10.10 Visualization of 
nucleic acids and proteins in gels. 
(a) Nucleic acid molecules (DNA 
and RNA) are visualized by binding 
ethidium bromide (EtBr) to them. 
EtBr fluoresces when excited by 
ultraviolet light, revealing bands of 
DNA in the gel. Molecular weight 
size markers are in lanes 1 and 8, and 
experimental samples are in lanes 2 
through 7. (b) General protein stains 
(such as coomassie blue, shown 
here) bind to proteins in electro- 
phoretic gels to reveal the locations 
of protein bands. Protein standards 
are in lane 1, and experimental 
samples are in lanes 2 through 4. 


identifying a particular nucleic acid sequence or a spe- 
cific protein from a heterogeneous pool of molecules in 
an electrophoresis gel. In a way, the process of searching 
for a DNA or RNA fragment containing a specific string 
of nucleotides or of searching through a large number 
of proteins for a specific protein is analogous to trying 
to find a specific word or phrase in a text document. 
Scanning each block of letters for the correct string is 
almost impossible without a tool for targeting the desired 
sequence. Just as word processing programs locate a de- 
sired word or phrase by searching for a specific string of 
letters using a “find” command, biologists use molecular 
probes to identify target nucleic acid sequences or target 
proteins following electrophoresis. 

In the search for a target DNA molecule in a Southern 
blot, the molecular probe is a short, single-stranded DNA 
fragment, and the target molecule is a region of DNA 
that contains a sequence complementary to the probe 
sequence. Similarly, single-stranded molecular probes 
detect target mRNAs in northern blots by the comple- 
mentary base pairing of probe and a segment of the tar- 
get nucleotide sequence. The pairing of complementary 
nucleic acid strands of the probe and the target sequence 
is called hybridization. In contrast to the nucleic acid 
probes used to detect DNA or RNA target sequences, 
molecular probes used to detect target proteins in west- 
ern blots are, as mentioned earlier, antibodies—immune 
system proteins that bind only to specific target proteins. 
Descriptions of Southern, northern, and western blot- 
ting, and the use of different kinds of molecular probes to 
identify specific nucleic acids or proteins on the blots, are 
provided in Research Technique 10.2. 


Electrophoretic Analysis of Sickle 
Cell Disease 


Like the hundreds of other mutations of the a-globin 
and f-globin genes that affect humans, the mutation 
producing SCD is a DNA sequence change that leads 
to an mRNA transcript differing from the wild type 
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Research Technique 10.1 


The Production and Detection of DNA Restriction Fragment Length Polymorphisms 


PURPOSE Restriction digestion followed by DNA gel elec- 
trophoresis is one method for detecting variation of DNA se- 
quence that alters the number or the relative positions of RFLP 
sequences. Variation in the number or length of restriction 
fragments can result from DNA sequence changes that alter a 
restriction sequence, making it unrecognizable, or that create 
a new restriction sequence. RFLP changes can also result from 
the insertion or deletion of DNA between restriction sequences 
that increase or decrease the length of restriction fragments. 


MATERIALS AND PROCEDURES DNA is isolated from cells 
and treated with one or more restriction enzymes to produce 
DNA restriction fragments. The restriction fragments are then 
separated by DNA gel electrophoresis, causing the fragments 
to be visualized as “bands” on the gel. Laboratory methods 
described in Research Technique 10.2 can also aid in the identi- 
fication of specific restriction fragments. 


DESCRIPTION DNA sequence variation altering the number 
or length of restriction fragments (RFLPs) produces distinctive 
restriction fragments for each allele. Organisms that are ho- 
mozygous for DNA sequence at a restriction site shown in the 
diagram produce the same restriction fragments from homolo- 
gous chromosomes. Heterozygous organisms have different 
DNA sequences on the two homologous chromosomes and, 
as the diagram shows, they, produce a total of three different 
restriction fragments from the chromosome regions shown. 
Detection of any or all of these fragments on a DNA gel is dic- 
tated by which molecular probe is used. 

Transmission of the RFLP alleles follows an autosomal co- 
dominant pattern in which DNA bands from both alleles are 
observed in heterozygous (R’R?) individuals. 


CONCLUSION DNA base substitution changes that alter a 
restriction sequence and the insertion or deletion of DNA be- 
tween two restriction sequences are the principal ways DNA 
sequence alterations can produce RFLPs. RFLP alleles form gen- 
otypes whose DNA restriction fragments produce distinctive 
patterns in gel electrophoresis. Each genotype has a distinctive 
combination of band number and band size on the gel. 
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RFLP Variation. Two homologous regions of DNA are identical except 
for a SNP that produces a base-pair substitution (highlighted) in 
restriction site 2 of one chromosome. DNA treated with EcoRI cuts 
allele R'at three restriction sites (1, 2, and 3) and forms two small DNA 
restriction fragments of 5.1 and 4.2 kb. The base substitution in the 
DNA sequence of the R7allele eliminates restriction site 2, and the DNA 
is cut only at sites 1 and 3, resulting in a single DNA restriction fragment 
of 9.3 kb. 


and, ultimately, to the production of a mutant form of 
B-globin protein. Specifically, through genetic studies 
spanning a period of 50 years, scientists discovered that 
a change in a single DNA base leads to a single-base dif- 
ference in mRNA transcripts and to B-globin proteins 
that differ at just one of the 146 amino acids that com- 
prise them. 

The key portion of the DNA, mRNA, and amino acid 
sequences of the wild-type (84) and mutant (8°) alleles is 
shown in Figure 10.4 (see p. 342). The single-nucleotide 
difference between the alleles is the result of a SNP of the 
type we described above. In comparison to the wild-type 
allele, the mutant 6° allele contains a single DNA base- 
pair substitution in the sixth DNA triplet of the coding 


sequence. This substitution leads to a single-nucleotide 
change in codon 6 of mRNA and to a protein with valine 
(Val) rather than glutamic acid (Glu) as the sixth amino 
acid in the B-globin polypeptide chain. 


Southern Blot Analysis of £8-Globin Gene 
Variation The 6° SNP is unusual in that it occurs in 
the coding sequence of the gene, whereas most SNPs 
occur in noncoding segments of the genome. We can 
detect the SNP in the $ allele because it destroys a 
restriction sequence, leading to an RFLP that is revealed 
by Southern blot analysis. 

Either two or three restriction sequences for the 
restriction endonuclease Ddel can occur near the 
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Analysis and inheritance of RFLPs. (a) Two alleles, R’ and R°, are 
characterized by different numbers of restriction sequences. DNA 
restriction fragments of 5.1 kb and 4.2 kb are produced for allele R’, and 
a 9.3-kb restriction fragment is produced for allele R°. (b) Each of the 
possible genotypes—two homozygous and one heterozygous— 
produces different numbers and sizes of restriction fragments. 

(c) Electrophoresis of restriction-digested DNA identifies a unique 
pattern of bands for each genotype: a 5.1-kb and a 4.2-kb fragment for 
genotype R'R', a single 9.3-kb fragment for R°R?, and all three DNA 
bands for heterozygotes. 
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Inheritance of RFLPs. This RFLP is inherited as an autosomal 
codominant. Heterozygous parents (R'R°) display three DNA bands on a 
gel. Their offspring can have any one of the three possible genotypes. 
Each genotype displays a characteristic number and size of DNA bands. 


Autosomal codominant inheritance of RFLP alleles. RFLP 
allele R’ produce DNA fragments of 4.2 kb and 5.1 kb, and allele 
R? produces one fragment of 9.3 kb. The child with the R?R? 
genotype has one DNA fragment band of 9.3 kb, the child with 
the R'R! genotype has two bands of 4.3 kb and 5.1 kb, and the 
heterozygous (R’R’) parents and children each have three DNA 
bands. 


B-globin gene, depending on the allele. Ddel recognizes 
the double-stranded restriction sequence 5'-CTNAG-3’, 
where N indicates that any of the four nucleotides (A, T, 
C, or G) can occur in the middle of the 5-bp sequence as 
long as the variable nucleotide is flanked by CT and AG 
dinucleotide combinations. 

Figure 10.11 shows three Ddel restriction sites, labeled 
1, 2, and 3, in the 64 allele. All three Ddel re striction se- 
quences are cleaved, producing two DNA fragments of 
1150 bp and 200 bp for the DNA region shown. Southern 
blotting of DNA from the 4 allele produces two DNA 
bands corresponding to fragment lengths of 1150 bp and 
200 bp. The target sequence for the molecular probe is 
split between two restriction fragments by DNA cleavage 


at Ddel site 2, and the probe hybridizes to both the 1150- 
bp and the 200-bp restriction fragments from ß4 alleles. 
In contrast, in Figure 10.12, the 8° allele is shown 
to contain two Ddel restriction sequences, labeled sites 
1 and 3. The middle restriction sequence, labeled 2 in 
Figure 10.11, is missing from the f° allele as a result 
of the base-pair substitution that produces the SNP. 
Only Ddel restriction sites 1 and 3 are cleaved in DNA 
carrying the f° allele; site 2 is not recognized by Ddel 
because of the SNP variation. This cleavage produces a 
single restriction fragment of 1350 bp in DNA carrying 
the B° sequence. The length of this fragment is the sum 
of the lengths of the two restriction fragments detected 
from the 64 allele (i.e., 1150 bp + 200 bp). Southern blot 
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Figure 10.11 Dadel restriction digestion and Southern 
blotting of wild-type B-globin gene. Restriction digestion and 
Southern blot analysis of 8^-allele DNA sequence identifies two 
DNA fragments that are hybridized by the molecular probe. 

A restriction fragment of 1150 bp (1.15 kb) is produced by 
cleavage at sites 1 and 2, and the 200-bp fragment is produced 
by cleavage at sites 2 and 3. 


analysis of B°-allele DNA produces a single DNA restric- 
tion fragment, measuring 1350 bp (1.35 kb) in length. 
Because Ddel site 2 is altered by SNP variation, the entire 
molecular probe target sequence is contained on a single 
1350-bp (1.35-kb) restriction fragment (Figure 10.13). 
People who are 6484 have bands of 1150 bp (1.15 kb) 
and 200 bp (0.20 kb) detected by the probe. Those who 
are B°B° have a single band of 1350 bp (1.35 kb), and 
those who have 846$ produce all three bands because 
they carry both alleles. 

The mutation that creates the 6$ allele by base- 
pair substitution of the 64 allele is the kind of mutation 
described in Research Technique 10.1 as creating an 
RFLP. Genetic Analysis 10.2 guides your interpretation of 
Southern blot analysis. 


Northern and Western Blot Analysis of the B-Globin 
Gene Transcript and Protein The DNA sequences of 
the B4 and B* alleles are identical except for the SNP that 
distinguishes the sequence of one from the other. Upon 
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Figure 10.12 The single-nucleotide polymorphism in 
the 6° allele. Base-pair substitution inactivates Ddel site 
2, and only sites 1 and 3 are cleaved. The molecular probe 
detects a single 1350-bp (1.35-kb) fragment in Southern 
blot analysis. 
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Figure 10.13 RFLP results for B-globin genotypes. 
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transcription, each allele produces an mRNA molecule 
containing 664 nucleotides. The  single-nucleotide 
substitution that differentiates the two alleles does not 
alter the length of the mRNA transcript. Considering 
that the molecular attribute producing electrophoretic 
mobility differences among mRNAs is total length of 
the molecule, it is not surprising that in this instance 
there is no difference in the electrophoretic mobilities of 
the mRNA transcripts of these two alleles, because the 
lengths of their mRNAs are identical. A northern blot 
analysis performed on mRNA from individuals with the 
three B-globin genotypes detects the same single-mRNA 
band for each genotype (Figure 10.14). Consequently, 
northern analysis is not useful in detecting variation in 
this case. 

Although the sequence difference between these two 
mRNAs is not detectable by northern blot analysis, a 
difference in the electrophoretic mobility of the poly- 
peptides for which they code is detectable using western 
blot analysis, because the resulting proteins differ in 
amino acid content. Recall from Figure 10.3 that the poly- 
peptides produced by the 64 and £* alleles differ at the 
sixth amino acid position of their respective 146-member 
amino acid strings. The amino acid change results in a 
small charge difference that produces distinctive elec- 
trophoretic mobilities for the proteins. Western blots 
reveal hemoglobin protein bands for the three genotypes 
in patterns that are essentially identical to the band pat- 
terns Pauling first detected (Figure 10.15). Individuals 
with homozygous genotypes 6464 and BBS each pro- 
duce a single protein band with different electrophoretic 
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Figure 10.14 Northern blot analysis of human B-globin 
mRNA. Transcription produces an mRNA that is 664 nucleo- 
tides in length for both alleles. The results of northern blot 
analysis are therefore identical for the three genotypes. 
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Figure 10.15 Western blot analysis of human B-globin 
protein. Single protein bands are seen in western blot analysis 
of B^B^ and B°B° homozygotes; two protein bands are detected 
for heterozygotes. 


mobility, and heterozygous individuals (648°) have two 
protein bands, each corresponding to the polypeptide 
product of a different allele. 


10.3 Sickle Cell Disease Evolved 
by Natural Selection in Human 
Populations 


Dozens of variant alleles of hemoglobin genes produce 
one form or another of hereditary anemia. According to 
the World Health Organization, hereditary anemias are 
the most common of all human genetic diseases; they 
occur in an estimated 250 to 300 million people around 
the world. Most of the globin-gene mutations causing 
hereditary anemia are rare, but a few are found in high 
frequency in certain populations. The 6° allele occurs 
in frequencies as high as 15% in several indigenous 
populations of Africa, the Middle East, and the Indian 
subcontinent. Population and evolutionary genetic 
analysis verifies that the allele arose independently in 
each region and has risen to high frequency by the same 
evolutionary process in each locality. Examples of other 
B-globin alleles found in high frequency are BS, primar- 
ily in populations from West Africa, and B*, in popula- 
tions from Southeast Asia and the Pacific Islands. 

The high frequencies of 8°, BS and B* are consistent 
with the conclusion that natural selection is working to 
increase the occurrence of these alleles. Population stud- 
ies over the last 50 years have firmly established malaria as 
the agent of natural selection leading to a high frequency of 
these B-globin gene alleles in certain populations. An envi- 
ronment where malaria is endemic favors the survival and 
reproduction of individuals who are heterozygous for 6^ 
and one of the mutant alleles over the other genotypes. In 
other words, individuals who are B4685, B4BS or B4B* have 
a survival and reproductive advantage over individuals who 
are homozygous B44 (and therefore succumb more easily 
to malaria) and over those who are homozygous for the mu- 
tant alleles (and therefore suffer from hereditary anemia). 
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Research Technique 10.2 


Blotting and Probing Nucleic Acid and Protein Molecules 


PURPOSE After gel electrophoresis, the separated nucleic 
acids or proteins are blotted—that is, transferred—onto a 
membrane that can withstand the vigorous manipulation that 
accompanies analysis. Molecular probes are applied to blots to 
detect sequences carried in DNA or RNA, and to detect specific 
proteins. 


MATERIALS AND PROCEDURES Restriction-digested 
DNA, isolated mRNA, or isolated proteins are first subjected to 
gel electrophoresis. Known standards and molecular weight 
size markers are run alongside experimental samples as con- 
trols to identify the length of nucleic acids or to identify the 
electrophoretic mobility of proteins. If the gel contains DNA, 
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the DNA must be denatured after electrophoresis is completed 
to allow molecular probes to locate their target sequence in a 
later step. Denaturation of DNA is accomplished by bathing 
the gel in a sodium hydroxide (NaOH) solution that breaks the 
hydrogen bonds between the strands. The gel is then blot- 
ted with a nucleic-acid-binding or protein-binding membrane 
that will absorb sample molecules from the gel. Next, single- 
stranded nucleic acid molecular probes tagged with either 
radioactivity or fluorescent or chemiluminescent labels are 
applied to the prepared membrane in a solution. The probes 
have sequences complementary to a specific target sequence. 
Probes that hybridize to their targets label the location of the 
band containing the target sequence via their chemical tags. 
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Molecular probe molecules that are not bound to a target 
molecule on the blot are washed away. Subsequently, for 
radioactively labeled molecular probes, autoradiography us- 
ing X-ray film captures the location of any bound molecular 
probe by detecting the radiation. Different detection methods 
are used if molecular probes are tagged with fluorescent or 
chemiluminescent labels. Similar methods are used to prepare 
Southern blots of restriction-digested DNA, northern blots of 
mRNA, and western blots of protein, except that neither RNA 
nor protein is denatured before blotting. 


DESCRIPTION Southern blotting is named after its developer, 
Edwin Southern, and uses single-stranded molecular probes 


to detect denatured DNA on the blot by complementary base 
pairing. Northern blotting detects membrane-bound mRNAs us- 
ing single-stranded molecular probes in a manner similar to that 
of Southern blotting. Western blotting detects proteins with the 
use of antibodies that specifically bind to target proteins. 


CONCLUSION Southern, northern, and western blots are 
produced by similar methods and use molecular probes to 
detect sample molecules or sequences of interest. Labeled 
molecular probes bind to specific target sequences or mol- 
ecules and are detected in autoradiographs or other analyses 
of blots that serve as a permanent record of the results of gel 
electrophoresis. 
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GENETIC ANALYSIS 


PROBLEM The 6-kb segment of DNA shown contains the 
Bca gene. The hybridization location for a molecular probe 


BREAK IT DOWN: Anudeic acid molecular probe will hybridize to 
any-sized fragment containing complementary base sequence (p. 354). 


complementary to a portion of the gene is indicated. The locations of five EcoRI restriction se- 


quences are also indicated, and the distances (in kilobases) between restriction sites are given. 


a. If this 6-kb region is digested with EcoRI, how many DNA fragments are generated? How BREAK IT DOWN: Each restriction 
: i A í HPA fragment has an EcoRI restriction site at 

many nucleotide base pairs are expected in each of the resulting restriction fragments? each end (p. 352). 

b. Which restriction fragment(s) will contain all or part of the Bca gene? 

c. DNA from the 6-kb segment is digested with EcoRI, and the EcoRI restriction sites (E) 
resulting fragments are separated by DNA gel electrophore- E E E E E 
sis. Which of the restriction fragments will be bound by the | | | | | 
molecular probe and seen as bands in the Southern blot? = Bca gene — 
Which fragments will not be detected by Southern blotting? | | j Era | | 
Explain your answer. 

t-0.8—+— 1.0 — 3.0 t 1.2 1 kb 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem addresses 
and the nature of the required answer. 


l; 


This problem concerns restriction digestion of a fragment of a gene and 
detection of restriction fragments with a molecular probe for a portion of 
the gene of interest. The answer requires identification of the length of 
restriction fragments that will and will not be detected by the probe. 


2. Identify the critical information given in 2. The locations of five EcoRI restriction sites and the distances between the 

the problem. sites are given. The segment of the gene bound by the molecular probe is 
identified. 

Deduce 

3. Examine the diagram to assess the rela- 3. The molecular probe binds to the longer of the two restriction fragments 
tionship of the molecular probe to the that contain part of the Bca gene. The sum of kilobase pairs in all the EcoRI 
gene, and assess the kilobase scale in restriction fragments equals 6.0 kb. 
relation to the EcoRI restriction sites and 
restriction fragments. 

Solve Answer a 

4. Determine the number and length (in base 4. Digestion with EcoRI produces four restriction fragments with lengths that 
pairs) of restriction fragments. are 0.8 kb (800 bp), 1.0 kb (1000 bp), 3.0 kb (3000 bp), and 1.2 kb (1200 bp). 

Answer b 

5. Identify the DNA fragments that contain 5. The 1.0- and the 3.0-kb restriction fragments contain segments of the 

portions of the Bca gene. Bca gene. 
Answer c 

6. Identify the DNA fragment that will 6. Only the 3.0-kb restriction fragment contains the sequence hybridized by 

hybridize with the molecular probe. the molecular probe. This fragment will be seen on the Southern blot. 
TIP: Molecular probes hybridize to target regions that contain 
complementary base sequences. 
7. Explain why one fragment is hybridized 7. The DNA sequence complementary to the molecular probe sequence is 


by the probe and why other fragments are 
not. 


PITFALL: Avoid confusion by remembering that DNA 
fragments that do not contain sequences complementary 
to a molecular probe cannot hybridize with the probe. 


completely contained on the 3.0-kb restriction fragment, so this fragment 
binds the probe. None of the other three restriction fragments contains 

a sequence complementary to the molecular probe, so although they 

are separated from one another by DNA gel electrophoresis, they are not 
hybridized by the probe and are not seen on the Southern blot. 


For more practice, see Problems 15, 23, and 25. 
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Malaria Infection 


Malaria is a potentially fatal infectious disease caused 
by protozoans. One of the most common and most 
serious forms of malaria is caused by Plasmodium falci- 
parum. This protozoan is carried by the mosquito vector 
Anopheles gambeii, which transfers the protozoan to ani- 
mals, including humans, when it bites them. The symp- 
toms of malaria include high fever and other problems 
that can cause death if not effectively treated. Once in- 
fected with P. falciparum, a person can suffer recurrences 
of malaria throughout life. As a consequence, victims of 
the disease are less healthy than their uninfected counter- 
parts and are susceptible to other diseases as well. Overall, 
malaria victims experience higher morbidity (illness) and 
mortality (death) and produce fewer children than do the 
rest of the population. 

Plasmodium falciparum and the mosquito that car- 
ries it flourish in tropical environments, and therefore 
malaria is endemic to the tropics. P. falciparum embryos 
live in their mosquito hosts, but they do not begin larval 
development until they are transferred to a mammalian 
host. Once inside a mammalian host, the plasmodium 
begins to mature, first in the liver of the host animal and 
later in the red blood cells. 


Heterozygous Advantage 


One of the best-documented examples of natural selec- 
tion in the evolution of human populations has been the 
relationship observed between malaria and the B* allele 
(Figure 10.16). Numerous anthropological and epidemio- 
logical studies have recorded the effects of malaria on the 
evolution of B° and SCD in African populations, and in 
other populations in Southern Europe and Asia. The cen- 
tral finding of these studies is that heterozygotes with the 
genotype B“B> survive and reproduce more dependably 
than other genotypes in environments where malaria is 
common. 

The improved survival and reproduction of het- 
erozygotes can be explained at a cellular level by the 
selective advantage that heterozygotes derive from the 
shortened average life span of their red blood cells. 
The average red cell life span in these individuals is 
shortened due to the presence of a certain amount of 
mutant B-globin protein and the consequent formation 
of a small number of sickle-shaped red blood cells. The 
shorter red cell life spans interrupt the developmental 
cycle of Plasmodium larvae by preventing many of the 
immature parasites from reaching maturity. As a result, 
heterozygotes suffer fewer cases of malaria than are ex- 
perienced by B“B4 homozygotes, and when they do get 
malaria, their disease is less severe. 

On a population level, individuals with SCD (ho- 
mozygous for the mutant gene) survive and reproduce 
very poorly due to their hemoglobin disorder. Those who 
are B“B4 also have lower reproductive fitness than do 


(a) 


| 4 E Areas with endemic 


falciparum malaria 


(b) 


Percent of population that 
has the sickle cell allele 


14+ E] 6-8 g 
E 12-14 4-6 
E 10-12 Œ 2-4 
E 8-10 0-2 


(c) 


Figure 10.16 The distribution of malaria and sickle cell 
disease. (a) Colored areas indicate the regions of the world 
where malaria is an endemic disease. Epidemic disease is 
periodic or seasonal. Endemicity ranges from hypoendemic, 
where disease is always present but at low frequency, to holo- 
endemicity, where disease is always present at extremely high 
frequency. (b) Frequency distribution of the 6° allele in some 
of the human populations occupying the malarial belt. (c) The 
distribution of the 6f allele in Southeast Asia. 


heterozygous carriers, because of the ravages of malaria. 
The result is that natural selection favors heterozygous 
carriers and causes populations to evolve a gene pool that 
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includes large proportions of both alleles. This heterozy- 
gous advantage seen for 848° individuals is balanced by 
the disadvantage to those with SCD. This mechanism of 
natural selection is called balancing selection. The action 
of the conflicting forces in balancing selection that favor 
the 6° allele in heterozygous genotypes and act against it 
in the homozygous genotype produce an overall increase 
in the 6$ allele until it reaches a stable equilibrium fre- 
quency, where the gain and loss of 6° alleles is equal. The 
term balanced polymorphism is used to describe the end 
result of balancing selection, a result in which the loss of an 
allele because of selection against one of its phenotypes is 
balanced by natural selection in favor of the allele for an- 
other phenotype. 

In research concerning heterozygous advantage in the 
evolution of 65, three findings are particularly important: 


1. The frequency of SCD carriers rises with increasing 
age in the population. Studies of genotype frequen- 
cies in malaria-afflicted populations find the fre- 
quency of B“B* heterozygotes to be lower in children 
than in adolescents, and to be lower in adolescents 
than in adults. In other words, individuals with B585 
and 844 genotypes are being lost from these popu- 
lations at younger ages than are heterozygotes. 


2. Heterozygous women produce a greater average 
number of children than do women who are “84. 
This is an indication that the overall health of het- 
erozygotes is better, leading them to reproduce more 
efficiently. 


3. Across the “malaria belt,” the portion of the tropics 
where malaria is common, the B* allele has devel- 
oped and evolved at least three times independently 
in different populations. Some human biologists 
believe the genetic evidence supports four separate 
mutation and evolution events. These independent 
evolutionary events account for the presence of 
B5 in high frequency in populations in the Middle 
East, the region surrounding the Mediterranean 
Ocean, parts of the Indian subcontinent, and parts 
of Africa. 


Evolution of BS and BF 


Additional support for the role of balancing selection in 
the evolution of globin genes comes from the study of two 
other B-globin gene alleles that are present at high fre- 
quencies in other populations in the malaria belt. Mutant 
B-globin alleles B© and B* have evolved due to the natural 
selection pressure of malaria in much the same way f° has 
evolved. 

The mutation known as 6° likely occurred thousands 
of years ago on the west coast of Africa. This mutation is a 


(a) (b) 
b^ template b^ template 
4 5 6 7 24 25 26 27 
DNA //TGA GGA Arcee} J/cca cca ciceG]/ 


RNA j/ACu ccua GiaAG// _//aau ecvjeaeiece]/ 
Protein / [THR PRO MEU GLU] | coer SALA] 


b° template Bf template 
4 5 6 7 24 25 26 27 
DNA //TGA GGAlmTc|cTc// ‘[/CCA CCA|iTC|caG]/ 


RNA _//ACU CCUINAG|GAG/ j/ocu coulaciacc// 
GLUY]| | DCO GY Aang] 


Figure 10.17 Sequence comparisons of B^ (the wild type) 
and mutant B-globin alleles B° and BF. (a) B° and 6€ are base- 
substitution mutants that alter amino acid position 6, changing 
glutamic acid (Glu) to lysine (Lys) in BS. (b) The base substitution 
mutant 8E changes amino acid 26 from glutamic acid (Glu) to 
lysine (Lys). 


Protein | THR {PRO 


base substitution of a nucleotide immediately adjacent to 
the site of the 8° mutation in the sixth DNA triplet of the 
B-globin gene (Figure 10.17a). The effect of the mutation 
is to change the sixth amino acid of the B-globin protein 
from glutamic acid to lysine. 

Although the mutation affects the same amino acid 
position altered in 85, the complications of the mutation 
are not as severe as those seen in SCD. Homozygosity for 
the B© mutation does not produce severe anemia and is 
rarely fatal. Like SCD carriers, however, heterozygotes 
with the genotype B4B° are more resistant to malaria 
than are B464 homozygotes. This situation leads to the 
spread of the B© mutation by a process of natural selec- 
tion parallel to that seen for g5. 

On the other side of the malaria belt, in Southeast 
Asia and the adjacent Pacific Islands, another B-globin 
gene mutation, BË, is prevalent. BË is a base substitu- 
tion mutation that alters amino acid 26 of the B-globin 
protein, changing it from glutamic acid to lysine 
(Figure 10.17b). The anemia seen in B&B" homozygotes 
is severe, but the selection it exerts against B} is balanced 
by the greater resistance of heterozygous carriers of the 
allele to malaria. 

Like the 8° variant that has been our focus through- 
out this chapter, the B“ and B* variants are distributed 
across the malarial belt that spans much of the tropical 
regions surrounding the equator. Clinical and epidemio- 
logical studies confirm that all three B-globin gene variants 
are advantageous in the heterozygous state because they 
reduce the incidence and intensity of malarial disease in 
carriers. The incidence of hereditary disease produced by 
homozygosity is balanced by the improved odds of survival 
and reproduction for carriers of these globin gene variants. 
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Transmission and Molecular Genetic Analysis of Thalassemia 


Autosomal recessive forms of a hereditary anemia called thal- 
assemia result from mutations of globin genes that create an 
imbalance in the ratio of the a-globin to B-globin polypep- 
tides. The imbalance reduces the amount of hemoglobin that 
can form and generates anemia. Owing to differences be- 
tween mutant alleles, thalassemias exhibit varying levels of 
severity, from mild to fatal. One particular form of thalassemia 
is common on the Mediterranean island of Sardinia. 

The Sardinian thalassemia mutation (OMIM 141900) is 
a DNA nucleotide base substitution (GC — AT) in the 39th 
codon (corresponding to the 39th amino acid) of the B-globin 
gene (Figure 10.18a). The mutation changes the 39th co- 
don of the transcript from 5'-CAG-3’, coding for the amino 
acid glutamine (Gln), to the sequence 5'-UAG- 3’, which is 
a stop codon. This change results in the premature termina- 
tion of translation of B-globin protein after the first 38 amino 
acids. The truncated protein is not functional. Consequently, 
individuals who are homozygous for the mutant allele have no 
B-globin protein. Their ability to form hemoglobin is greatly 
diminished, causing severe anemia. Heterozygotes also have 
diminished capacity to produce hemoglobin, and they suffer 
from chronic anemia. Since heterozygotes have one wild-type 


(a) DNA sequence variation of B-globin alleles 


B-globin allele, however, their anemia is less severe than in 
homozygotes. 

Wild-type B-globin alleles have two recognition sites for re- 
striction endonuclease Mael (restriction sequence 5'- CTAG- 3’) 
in the vicinity of the gene (Figure 10.18b). The base-substitution 
mutation in DNA triplet 39 creates a new Mael restriction site 
that is not found in the wild-type sequence. As we identified 
in Research Technique 10.1, the creation of a new restriction 
sequence by a base-pair substitution mutation is a second muta- 
tional mechanism for the creation of RFLPs. In this case, whereas 
the wild-type allele contains two Mael restriction sites separated 
by approximately 1500 base pairs (1.5 kb), the mutant allele 
sequence contains a third Mael restriction site that cleaves the 
1.5-kb region into two DNA fragments of 0.5 kb and 1.0 kb. 
Southern blot analysis of Mael-digested B-globin DNA utilizes a 
molecular probe that binds near one end of the B-globin gene. 
The probe binds a 1.5-kob DNA fragment produced by Mael 
treatment of the wild-type allele and a 1.0-kb DNA fragment 
produced by Mael treatment of the mutant allele. The 0.5-kb 
DNA fragment is also produced by Mael digestion of the mutant 
allele, but that fragment is not detected in Southern blot analysis 
because it is not bound by the molecular probe. 


(c) Southern blot analysis of B-globin allele variation 


DNA triplet: 36 37 38 39 40 Mm Mm 
Wild-type (m) Coding strand 5° 7]/CCT TGG ACC MAG AGG TTC I E | FO 


Template strand 3'1 [ACA NCCT ATTANG] j 5’ 


Mutant (m) 


Coding strand 5° ]//CCT TGG ACC MAG AGG TIC /[ 3’ 


Template strand 3'_//@GA ACC TGG BTC TCC AAG// 5’ 


New Mael g 
restriction site = 
D15- ———— = 
v 
(b) Restriction digestion of B-globin alleles g 
Mael Mael E 
site 1 site 2 g 10 — = = = 
= 
1.5 kb = 
, i A 
Wild-type (M) 5 [| CTAG CTAG LE 
3° {[GATC GAT Cys 
Mael New Mael Mael 
site 1 site site 2 
l 0.5 kb l 1.0 kb | 
Mutant (m) [| CTAG CHA G ~ I E 
3 eare eTe CAm A s’ 
Probe 


Figure 10.18 Molecular genetic analysis of variation at DNA triplet 39 of the B-globin gene in 
Sardinian B-thalassemia. (a) DNA sequences of the wild-type (M) and mutant (m) B-globin alleles 
from triplet 36 through 41. (b) Restriction maps of wild-type and triplet-39 mutant alleles show a new 
Mael restriction site in the mutant allele. The location of molecular probe binding identifies a 1.5-kb 
DNA fragment for the wild-type allele (M) and a 1.0-kb fragment for the mutant allele (m). (c) Southern- 
blot analysis of a family showing segregation of wild-type and triplet-39 mutant alleles. Heterozygous 
(Mm) parents produce children with all three genotypes. The genotype of Il-4 is discussed in the text. 
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In terms of the presence or absence of the Sardinian 
mutant allele, three genotypes are possible at this locus, each 
having a unique restriction fragment banding pattern detect- 
able by Southern blotting. Homozygotes for the wild-type 
allele (MM) produce a single Southern blot band of 1.5 kb. 
Homozygotes with severe anemia have the mm genotype 
and produce a single DNA band of 1.0 kb. Heterozygotes (Mm) 
produce both bands, since they carry both alleles. 

Figure 10.18c shows a nuclear family pedigree that 
is consistent with an autosomal pattern of inheritance of 


SUMMARY ( MasteringGenetics™ 


10.1 An Inherited Hemoglobin Variant Causes 
Sickle Cell Disease 


Hemoglobin is an abundant protein in red blood cells, trans- 
porting oxygen throughout the body. Its structure is tetra- 
meric, composed of two polypeptides encoded by the a-globin 
gene and two polypeptides encoded by the B-globin gene. 
Mutation of genes frequently leads to abnormal structure 
and function of proteins. Mutations of a-globin or B-globin 
genes often produce hereditary anemia, the most common 
category of hereditary disease known in humans. 

Sickle cell disease (SCD) is a common hereditary anemia 

in humans caused by homozygosity for the B° allele of the 
B-globin gene. Individuals with SCD have the genotype 8°65. 
The globin protein produced by 8° differs from the normal 
B-globin gene product (64) by a single amino acid substitution. 
Hemoglobin in people with SCD is unstable and linearizes at 
low oxygen concentration, distorting the red blood cell into 
a sickle shape. The distorted cells can block narrow capillar- 
ies, producing oxygen starvation in tissues that leads to tis- 
sue damage and other complications. Sickle cell disease leads 
to premature death of red blood cells. 

Heterozygous carriers of the B° mutation (8465) have a small 
percentage of sickle-shaped red blood cells but do not suffer 
symptoms or complications of the disease. 


10.2 Genetic Variation Can Be Detected by 
Examining DNA, RNA, and Proteins 


E 


Gel electrophoresis demonstrates the molecular basis of 
SCD by revealing that the protein products of the B4 and g5 
alleles have different electrophoretic mobilities. Distinctive 
electrophoretic band patterns are detected for many geno- 
types of the B-globin locus. 

The single amino acid substitution caused by the 6 allele is 
a valine in place of a glutamine in the B-globin protein. 
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thalassemia. The pedigree symbols for each family member 
are located directly above the Southern blot lane contain- 
ing that person’s DNA. The Southern blot detects a different 
pattern of DNA bands for each genotype. The figure also 
illustrates Southern blot results for DNA obtained from a 
fetus (the diamond-shaped symbol identified as Il-4) being 
carried by l-2. This analysis is a prenatal molecular diag- 
nostic test for Sardinian thalassemia that is based on the 
Southern blot band differences among the three possible 
genotypes. 


For activities, animations, and review quizzes, go to the Study Area. 


DNA analysis identifies the B° mutation as a base-pair 
substitution in the B-globin gene that produces a single 
nucleotide polymorphism (SNP) and eliminates a Ddel 
restriction site. 
An RFLP distinguishes the 6“ and B* alleles. 
The pattern of inheritance of RFLPs parallels that of alleles 
at the B-globin gene. 
DNA restriction fragments are detected by transferring 
denatured DNA fragments from electrophoresis gels to a 
permanent membrane in the Southern blotting process. 
In Southern blots, single-stranded nucleic acid probes la- 
beled with radioactive or chemical markers hybridize with 
complementary target sequences in DNA fragments. 
The presence and size of one or more DNA fragments 
hybridized by a molecular probe are revealed by the 
appearance of bands in Southern blot analysis. 

| Northern blotting is similar to Southern blotting but 
examines mRNA for differences in length. 
Western blotting uses antibodies with radioactive or chemi- 
cal labels to detect protein electrophoretic mobility variation. 


10.3 Sickle Cell Disease Evolved by Natural 
Selection in Human Populations 


The B* allele has evolved to high frequency in many popula- 
tions in the malaria belt as a consequence of natural selec- 
tion, which favors 648° heterozygotes as the most fit in the 
malarial environment. 
Heterozygous advantage in the case of B4 and B* alleles 
stems from disruption of the malarial parasite life cycle, a 
result of the somewhat shorter average life span of red blood 
cells in heterozygotes. 

| Mutations of the B-globin gene, including B° and gË, 
appear to have evolved in distinct populations by processes 
similar to those that established 6° in human populations. 
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restriction sequence (p. 345) 

sickle cell disease (SCD) (p. 339) 

single nucleotide polymorphism (SNP) 
(p. 345) 

Southern blotting (p. 349) 

sticky end (p. 346) 


western blotting (p. 349) 


PROBLEMS 


Chapter Concepts 


1. Define the following terms as described in this chapter: 


balanced polymorphism 
heterozygous advantage 
balancing selection 
intron 
hemoglobin tetramer 
hereditary anemia 
exon 
heterozygous 
recessive 
molecular disease 
restriction endonuclease 
homozygous 

. gel electrophoresis 
restriction fragment length polymorphism (RFLP) 
SNP 
electrophoretic mobility 
Southern blot 
molecular probe 
northern blot 
antibody probe 
western blot 


Serr repo Rp Rg oRT Hera me ano 


. Using sickle cell disease as an example, describe the simi- 
larities and differences between the terms genetic disease 
and molecular disease. How are molecular or genetic dis- 
eases different from diseases that are caused by an infec- 
tious organism such as a bacterium? 


Compare and contrast the contributions of Neel, Pauling, 
and Ingram to our understanding of the genetic and mo- 
lecular bases of sickle cell disease. 


. Why do differences in protein electrophoretic mobility of- 
ten result from changes to protein amino acid sequences? 
How can electrophoretic mobility differences arise 
between the protein products of different alleles? 


Electrophoretic analysis of hemoglobin from a person with 
normal HbA and a person with hereditary anemia reveals no 
difference in the electrophoretic mobility. How can this occur? 


. Many types of hereditary anemia result from single amino 
acid substitutions affecting one of the hemoglobin pro- 
tein chains. For example, the wild-type B-globin allele 
has the template DNA sequence CTC at triplet 6, which 
encodes the amino acid glutamic acid (Glu) at position 6 
of the B-globin protein (see Figure 10.3). The mutant allele 


( MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 


For answers to selected even-numbered problems, see Appendix: Answers. 


producing B® contains the DNA sequence CAC and en- 
codes valine (Val) at B-globin position 6, and the B© muta- 
tion contains TTC in DNA and encodes lysine (Lys) at posi- 
tion 6. The table below shows several other B-globin gene 
mutants that are the result of single amino acid substitu- 
tions. Use the information provided and Table A inside 

the front cover to determine the wild-type template DNA 
sequence and the template sequence for each mutant. 


6-Globin Form Position Amino Acid 
B^ (wild type) 7 Glu 
Siriraj a 7 an Lys i 
San Jose 7 Gly 
B^ (wild type) 58 Pro 
Ziguinchor 58 Ag 
B^ (wild type) 145 Tyr 
Bethesda 145 His 
Fort Gordon 1 45 Asp 


. A single base substitution creates the a-globin gene mu- 


tant Hb Constant Spring (Hb“), whose product contains 
172 amino acids. Wild-type a-globin protein contains 

141 amino acids. The wild-type mRNA carries the codon 
CGU to encode arginine (Arg) as the final amino acid of the 
chain, followed by the stop codon UAA. The Hb© mutant 
produces mRNA that has the sequence CGUCAA in this 
region. Explain how the single DNA base substitution in 
Hb“ can lead to production of a protein that contains 31 
more amino acids than the wild type has. 


. Wild-type B-globin protein is composed of 146 amino 


acids. A B-globin gene mutant known as Hb Cranston con- 
tains 157 amino acids. Partial mRNA sequences of B4 and 
Hb Cranston (Be are shown. The numbers indicate amino 
acid positions. Identify the mutation that causes 8%", and 
describe how the mutation leads to a longer than normal 
B-globin protein chain. 

144 145 146 Stop 
B^ AAG UAU CAC UAA GCU CGC UUU CUU GCU GUC 

CAA UUU CUA UUA A 

144 145 146 147 150 
B° AAG AGU AUC ACU AAG CUC GCU UUC UUG 


155 156 157 Stop 
CUG UCC AAU UUC UAU UAA 
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9. Describe why sickle cell disease is considered to be a reces- 
sive genetic disorder. 


10. What molecular parameter causes DNA fragments to have 
different electrophoretic mobility? What parameter causes 
different mobilities in mRNA? What parameters cause dif- 
ferent mobilities in proteins? 


11. Howis an autoradiograph produced from a Southern blot? 


12. Both Southern blotting and northern blotting can reveal 
information about the DNA fragments or RNA molecules 
being examined, but the positions of nucleic acid bands in 
one kind of blot cannot be directly compared with those in 
the other. Why? 


Application and Integration 


15. The family represented in the pedigree and Southern blot 
below has been evaluated for the presence and distribu- 
tion of the g‘ allele. Use the information in the Southern 
blot and the explanation provided in the chapter to iden- 
tify the phenotype and determine the genotype of each 
person tested. 


i wy 


kb © 
1.35 4 
1.15 4 


0.20 — == 


16. Suppose the mating couple (I-1 and I-2) shown in Problem 

15 are expecting a fifth child. 

a. Is it possible that their fetus could have sickle cell dis- 
ease? If so, what is the probability? If not, explain why 
not. 

b. Fetal DNA is collected and analyzed by Southern blot- 
ting. The fetus has a single DNA band that is 1.35 kb 
in length. What is your interpretation of this result? 
Explain your answer. 


17. What are restriction endonucleases, and why are they use- 
ful in identifying DNA sequence variation? 


18. Following restriction digestion, DNA fragments produced 
by digestion with certain enzymes have “sticky ends,” 
while fragments produced by digestion using other en- 
zymes have “blunt ends.” Distinguish the meaning of these 
two terms. 


19. The double-stranded DNA sequence below is part 
of a restriction fragment you wish to detect by 
autoradiography. 


13. 


14. 


20. 


® 21. 


22. 
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The target sequence on a fragment of DNA is 

3'- ATATCGCACGGACT- 5’. What is the sequence and 
polarity of an equivalent-length molecular probe used 
to detect this target sequence? Explain why the molecu- 
lar probe you have proposed will detect the targeted 
sequence. 


The B* allele occurs in a central West African population 
at a frequency of 15%. The same allele occurs 

in a population from the southern tip of Africa at a fre- 
quency of less than 1%. Speculate about the reason for 
the different frequencies of the allele in these two African 
populations. 


For answers to selected even-numbered problems, see Appendix: Answers. 


5'-.. ATTCATGACGGACTATTCGAGAGCTGATGCAT...- 3’ 
3'-.. TAAGTACTGCCTGATAAGATCTCGACTACGTA...- 5’ 


Identify which of the following molecular probes is the best 
choice for achieving the desired hybridization reaction. 
Indicate where on the upper or lower strand the probe will 
hybridize. 

a. 3'-TGATATCGTACCGAA-5’ 

b. 5'-TGCCTGATAAGATCT- 3’ 

c. 3'-ACAGCCTAGTAAGAT-5’ 

d. 3'-ACTGCCTGATAAGCT-5’ 


Restriction enzymes recognize specific double-stranded 
DNA sequences that have the same sequence on both 
strands. For example, the restriction sequence for BamHI 
is 5'-GGATCC- 3’ on each DNA strand and for Smal is 
5'-CCCGGG- 3’ on each strand. A single phosphodiester 
bond on each strand is cut at the same place in the se- 
quence on each strand. Explain how restriction enzymes 
are able to recognize the same sequence and cut the se- 
quence in the same place on each DNA strand. 


Four alleles of a variable DNA marker gene produce 

differentsized DNA fragments as follows: R!=4kb, 

R? = 13 kb, R? = 10 kb, R4=7 kb. 

a. Identify the genotypes of individuals depicted in lanes 
1, 2, and 3 in the gel shown. 


Lane: 1 2 3 4 5 6 
kb © 
13 = —_— — 
10 — = 
754 ú- 
4- — 
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b. In lanes 4, 5, and 6, draw the band patterns expected 
for individuals with, respectively, genotypes R/R?, R?RÍ, 
and R/R!. 

Consider this DNA sequence: 


5'-TTCGAATTCGACTCAGGATCCTACAAGTTTCAT- 3’ 
3'- AAGCTTAAGCTGAGTCCTAGGATGTTCAAACTA- 5’ 
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Which of the following restriction sites are present in this 
sequence? Draw a box around each restriction sequence. 
a. EcoRI (5'-GAATTC-3') 

b. BamHI (5'-GGATCC-3') 

c. Hinlll (5'-AAGCTT- 3’) 


Two probes designated probe A and probe B hybridize very 
near one another in a region of DNA that contains DNA 
fragment length variation when digested with the restric- 
tion enzyme HinIII. Four maps show the location and 
intervening distances in kilobases of HinIII restriction sites 
and the binding locations of probes A and B. The maps 
correspond to alleles H! to H?. 
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a. For the genotype H/H°, what DNA bands are expected 
in an autoradiogram using probe A? Using probe B? 
Using both probes together? 

b. For the genotype H7H*, what band pattern is expected 
using probe A? Probe B? Using both probes together? 

c. Suppose a woman with the genotype H/H? has a child 
with a man whose genotype is H?H*. What are the four 
possible genotypes for a child of this couple? 

d. What are the sizes of bands produced by each possible 
child of this couple using probe A? Using probe B? 


Plants of a particular species can either have the dominant 
wild-type phenotype, tall (T), or the recessive phenotype 
called dwarf (D). Genetic analysis has identified the stature 
gene, and DNA analysis of tall plants yields a DNA frag- 
ment of 7.5 kb corresponding to a portion of the gene. The 
wild-type gene map is illustrated, along with an autoradio- 
graph showing DNA restriction fragments from a normal 
parental plant (T; lane 1), a tall plant that carries a copy of 
the mutant allele (T; lane 2), the two DNA fragment pat- 
terns observed in tall progeny plants (T; lanes 3 and 4), and 
the DNA fragment pattern seen in dwarf plants (D; lane 5). 
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27. 


28. 


29. 


30. 
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a. Purpose two mutational events that could cause the 
small DNA fragment that represents the mutant allele. 

b. In northern blot analysis of mRNA, what mRNA dif- 
ferences would you anticipate for your two proposed 
mutational mechanisms? 


A second strain of dwarf plants has a different mutation 

of the same gene identified in Problem 24. In the second 
strain, plants carrying a copy of the mutant allele produce a 
DNA restriction fragment of 10.5 kb, rather than the 7.5-kb 
fragment. DNA fragments produced by digestion of DNA 
from tall carrier plants are shown below in lanes 1 and 2, 
fragments from tall progeny of carriers are shown in lanes 3 
and 4, and DNA from dwarf plants is shown in lane 5. 


Parents Progeny 
| 
T T T T D 
Lane: 1 2 3 4 5 
© 
kb 
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a. What mutational mechanism is most likely responsible for 
the production of abnormal DNA fragment length corre- 
sponding to this mutant allele? Explain your reasoning. 

b. In comparison to the length of mRNA from the normal 
allele, will the mRNA from this mutant allele most likely 
be longer, shorter, or about the same length? Explain 
your answer. 


During gel electrophoresis of linear DNA molecules, why 
do longer molecules move more slowly than shorter mol- 
ecules? What determines the difference in electrophoretic 
mobility of mRNA molecules? 


What three features of proteins are most important in 
determining their electrophoretic mobility? Based on your 
answer, describe how single amino acid substitutions can 
change the electrophoretic mobility of a protein. 


In molecular biology, restriction endonucleases isolated 
from bacteria are used to cleave DNA into fragments. 
What functional role do restriction endonucleases serve in 
the bacteria from which they are derived? 


A complete plant gene containing four introns and five ex- 
ons is carried on a 6.0-kb DNA fragment. DNA sequencing 
analysis finds that this fragment contains 1000 base pairs 
that flank the transcribed region of the gene and 5000 base 
pairs that are transcribed. Four introns contain 3500 base 
pairs, and five exons contain 1500 base pairs. Northern blot 
analysis is performed on mRNA of this gene using a probe 
that binds to a portion of one of the exons. mRNA isolated 
from the cytoplasm of cells is compared to mRNA isolated 
from cell nuclei on the northern blot. Do you expect that 
all the mRNAs will be a uniform length, or will mRNA mol- 
ecules of multiple lengths be detected on the northern blot? 


Two male hounds, identified in the figure as d1 and 32, 
got loose one night at Wet Noses Puppy Farm. A female 
(¢ A in the figure) got pregnant and had a litter of three 
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puppies (P1, P2, and P3). The owner of Wet Noses is des- 
perate to know which male is the father and has electro- 
phoretic analysis of a variable DNA genetic marker (shown 
in the figure) to guide in the identification. The owner 
thinks 31 is the father of the three puppies. Is the owner 
correct? Explain your answer. 


QA g1 g2 PI P2 P3 
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The map below illustrates three alleles in a genome seg- 
ment. The alleles differ in the number and location of 
restriction sequences. Restriction digestion and Southern 
blotting with the molecular probe whose hybridization 
location is indicated results in detection of a single DNA 
band for each allele. 


Restriction sites (R) 


1.0 — 2.0 t 1.5 kb 


a. List the size (in kilobases) of DNA bands detected by 


Southern blotting of restriction-digested DNA from 
organisms with the genotypes D?D?, D?D?, and D?D?. 


b. Restriction-digested DNA from two organisms is ana- 


lyzed by Southern blotting. Restriction fragments of 
2.0 and 3.5 kb are observed on the Southern blot of one 
organism, and bands of 2.0 and 3.0 kb are observed for 
the other. What are the genotypes of these organisms? 


c. Organisms with the genotype D’D! are identified by 


the detection of a 2.0-kb DNA band on a Southern blot. 


Why does this genotype produce a single detectable 
band, and why are the 1.0- and 1.5-kb restriction frag- 
ments not detected in Southern blotting? 


32. A dominant wild-type allele D produces full enzyme func- 


tion, but a recessive allele d; produces no functional en- 
zymatic action, and a recessive allele dọ produces reduced 
enzyme function. Western blot analysis of the proteins 
produced by organisms with different genotypes for this 
gene gives the results shown. 


Genotype DD Dd, Dd, dd, dd, dd, 
© 
_— — -| 
o- — — 
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a. What kind of a protein change might result in two west- 
ern blot bands for organisms with the Dd» genotype? 

b. What might explain the absence of detectable protein 
for organisms with the d;d; genotype? 

c. Why might there be just one protein band for organ- 
isms with the d;d and dod» genotypes? 

d. Based on your assessment of the western blot analysis, 
speculate about the nature of the mutations producing 
d; and dp. In other words, what has happened at the 
DNA level to produce these mutations? 


Northern blot analysis is performed on mRNA produced 
by transcription of a gene in organisms with different 
genotypes. Three alleles occur at the gene: N is a dominant 
wild-type allele, and alleles n; and m2 are each recessive al- 
leles. Results of northern blot analysis of organisms with 
six different genotypes are shown. 


Genotype NN Nn, Nn, nn, nin NN? 
O 
—— — — —_— — 
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a. Organisms with the genotypes NN, Nig, and ngng each 
have single bands with the same electrophoretic mobil- 
ity. Thinking about the composition of mRNA, explain 
this observation. 

b. Organisms that are 1); have a single mRNA band with 
higher electrophoretic mobility. Thinking about the 
composition of mRNA, explain this observation. 

c. Two mRNA bands are detected for organisms with the 
Nn; and the nno genotypes. Explain this observation. 


Chromosome Structure 


Interphase chromosome territories in a chicken cell nucleus. Different 
fluorescent in situ hybridization probes label each chromosome that lays in 
its own well-defined territory. 


[ts genome of a species is the total amount of heredi- 
tary information in an entire set of its chromosomes. 
Chromosomes consist largely of DNA, and we describe 
the molecular structure of DNA and the importance of 
its nucleotide sequence in Chapter 7. But the structure 
and sequence of DNA are only a partial description of the 
genome. Arguably even more important to the genome 
story is the way the DNA is organized in chromosomes. 

Every chromosome carries a single, long DNA molecule. 
The chromosome may be singular, as in bacterial and archaeal 
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species; it may be a member of a homologous pair 
of chromosomes, as in diploid eukaryotic species; or 
it may be one of multiple chromosomes in a poly- 
ploid set, as in certain plant species. Whatever their 
number and regardless of whether they belong to 
archaea, bacteria, or eukarya, all chromosomes are 
composed of DNA that is organized by proteins of 
different types and in different amounts. 

The combination of protein and DNA in chro- 
mosomes is critical in accomplishing four essential 
functions. First, protein helps compact the DNA so 
that chromosomes will fit efficiently into the bacte- 
rial or archaeal cell or into the eukaryotic nucleus. 
Second, protein helps stabilize DNA and protects 
it from damage. Third, protein promotes chromo- 
some condensation required for cell division. Finally, 
the packaging of chromosomes with proteins helps 
regulate DNA replication and gene transcription, par- 
ticularly in eukaryotic genomes. 

This chapter describes chromosome structure 
and the composition of the genetic material in 
viruses, bacteria, eukaryotes, and archaea. The asso- 
ciation of proteins of various types with DNA and the 
ways in which this association aids in accomplishing 
the four essential functions identified above are cen- 
tral to the discussion. We begin with a discussion of 
virus structure, viral genomes, and the variability of 
the genetic material carried by viruses. 


11.1 Viruses Are Infectious Particles 
Containing Nucleic Acid Genomes 


A virus is a noncellular infectious particle containing 
nucleic acid in a small genome that encodes a limited 
number of genes. The nucleic acid can be either single- 
stranded or double-stranded DNA or RNA. Viral genomes 
do not contain all of the genetic information required for 
the virus to replicate and express its genetic material. As a 
consequence, viruses are obligate parasites, meaning that 
they must infect a host cell—which, depending on the vi- 
rus, may be a bacterial, archaeal, plant, or animal cell—in 
order to express the genetic information contained in the 
genome and produce the proteins required to generate 
new viral progeny. Each type of virus has a limited “host 
range,” meaning that a particular type of virus can infect 
only cells of a certain host or group of hosts. 


Viruses are not cellular; they lack most of the features 
belonging to a cell. Instead, they are particles consisting of 
a protein structure with genetic material contained inside. 
Other proteins encasing the viral particle recognize bind- 
ing sites on the surface of potential host cells. Once 
bound to the outside of a host cell, the virus may enter 
the cell or inject its genetic material into the host cell to 
begin the infection cycle. Viral infections of host cells 
proceed by one of two mechanisms. Some viruses spread 
their progeny by budding new progeny viral particles 
from an infected host cell. Many chronic viral infections 
in eukaryotes are sustained by budding. Infection by 
the human immunodeficiency virus (HIV) is maintained 
in this manner. Alternatively, an infected host cell my 
undergo lysis (rupture) that releases a large number of 
progeny viral particles. Section 6.5 describes details of 
lysis following viral (bacteriophage) infection of bacterial 
cells. Whether released by budding or by lysis, progeny 
viral particles seek out new host cells to infect. Certain 
viruses have a third option as well: entry into the lyso- 
genic life cycle. Viruses capable of lysogeny integrate into 
a host chromosome, replicating along with the host DNA, 
until conditions are right for the virus to excise itself and 
undertake host cell lysis. 


Viral Genomes 


The content of viral genomes, the structural configura- 
tion of the nucleic acid, and the genome size all vary from 
one kind of virus to another (Table 11.1). Regardless of 
whether DNA or RNA is the genetic material of a viral 
particle, and irrespective of whether the nucleic acid is 
double-stranded or single-stranded, the nucleic acid is as- 
sociated with no additional proteins. 

Viral genomes range in size from a few thousand 
bases of single-stranded DNA or RNA (or base pairs, in the 
case of double-stranded RNA) to more than 200,000 base 
pairs of double-stranded DNA, and they range in content 
from 5 genes to nearly 300 genes. Viruses with a small 
number of genes typically express all their genes shortly 
after infection. Viruses with larger genomes and more 
genes, such as bacteriophage i (lambda), cytomegalovirus, 
and herpes simplex virus, express their genes in a regulated 
manner at different times following infection. 

Despite their diverse genome structures, viruses follow 
the central dogma of molecular biology (DNA — RNA —> 
protein) outlined in Figure 1.8, meaning that regardless of 
the type of nucleic acid comprising the genome, mRNA 
is generated by transcription of viral genes for translation. 
These processes, along with viral genome replication, utilize 
host cell proteins and host cell structures such as ribosomes. 


Viral Protein Packaging 


The viral genetic material is enclosed in a protein coat 
known as a capsid. Some viral genomes are packaged 
in a capsid that is a protein shell. These viruses, called 
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Table 11.1 Composition and Organization of Selected Viral Genomes 
Number 

Virus Nucleic Acid? Genome Size of Genes? Chromosome Form Host 
Parvovirus ssDNA 5176 bases 5 Linear Animals 
@X174 ssDNA 5386 bases 11 Circular Bacteria 
fd ssDNA 6400 bases 10 Linear Bacteria 
Simian virus 40 dsDNA 5243 bp 5 Circular Animals 
Cauliflower mosaic virus dsDNA 8025 bp 7 Circular Plants 
Bacteriophage lambda dsDNA 48,514 bp 71 Linear Bacteria 
Bacteriophage T4 dsDNA 168,903 bp 288 Linear Bacteria 
Herpes simplex virus dsDNA 158,400 bp Ui Linear Animals 
Human cytomegalovirus dsDNA 229,351 bp 162 Linear Animals 
Poliovirus ssRNA 7433 bases 13 Linear Animals 
Tobacco mosaic virus ssRNA 6400 bases 6 Linear Plants 
Human immunodeficiency ssRNA 9700 bases 9 Linear Animals 
virus (HIV) 
Influenza virus ssRNA 13,500 bases 11 Linear Animals 
Reovirus dsRNA 23,549 bp 10 Linear Animals 
4 ss = single-stranded; ds = double-stranded 
? If linear 

nonenveloped viruses, are sometimes identified as “naked (a) (b) Pe 


viruses,” since they consist of nothing but a protein shell Capsid subunits 


enclosing viral genetic material. In other viruses, called 
enveloped viruses, the capsid is surrounded by an enve- 
lope of host cell cytoplasmic membrane that is acquired as 
the viral progeny escape the host cell (Figure 11.1). 

Non-enveloped viruses undergo capsid self-assembly. 
In this assembly process the capsid incorporates a copy 
of the viral genomes that, once fully assembled, is ready 
for release from the host cell. Figure 11.2a shows the self- 
assembly of the tobacco mosaic virus capsid and pack- 
aging of the single-stranded linear RNA genome of the 
virus. Figure 11.2b is an electron micrograph of tobacco 
mosaic virus. 


Virus RNA 


Envelope x . á 
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acid 


Capsid 
Non-enveloped virus 


Enveloped virus 


Figure 11.1 Enveloped and non-enveloped viruses. A pro- 
tein capsid encloses the viral chromosome. An enveloped virus 
acquires its covering of host cell cytoplasmic membrane as it is 
released from the cell. 


Figure 11.2 Viral structure and assembly. (a) Assembly 
of the tobacco mosaic virus and packaging of its genetic 
material. (b) Electronmicrograph showing the rod-shaped 
tobacco mosaic virus. 
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More complex viral particles, such as bacteriophage 
T4 and bacteriophage \ are assembled by a process known 
as directed assembly. The non-capsid proteins catalyze 
the assembly of capsid components but dissociate as the 
process nears its end, leaving the finished viral particle 
composed of its specific components. Directed assembly 
also includes the incorporation of the viral genome into 
the capsid. 


11.2 Bacterial Chromosomes Are 
Organized by Proteins 


Bacterial genomes are haploid and generally contain a 
single chromosome composed of double-stranded DNA. 
Depending on the species and the growth conditions, 
certain bacteria will sometimes carry two or more copies 
of the bacterial chromosome. The genetic information car- 
ried on each chromosome copy is identical, so each gene is 
represented by a single DNA sequence. In this section, we 
describe properties of bacterial genomes and the structure 
of bacterial chromosomes. 


Bacterial Genome Content 


Most bacterial species, including widely studied bacterial 
species such as Escherichia coli and Bacillus subtilis, have 
circular chromosomes. There are, however, numerous 
examples of bacterial species that contain a linear chro- 
mosome. Table 11.2 illustrates some of the chromosome 
diversity found among bacterial species. 

Most bacterial genomes encode several thousand 
genes that are densely packed throughout the chromo- 
some. These so-called structural genes contain the DNA 
sequences that encode bacterial proteins. These are con- 
sidered to be genes that are essential for normal bacterial 
functions and metabolism, and they populate the majority 
of the chromosome. These regions include the regula- 
tory sequences that promote and terminate transcrip- 
tion, as we discuss in Chapter 8. Interspersed between 
genes are short intergenic regions. These regions are 


Table 11.2 


Chromosome Diversity among Bacteria 


not transcribed and serve to separate one gene from the 
next gene on the chromosome. Bacterial chromosomes 
contain small amounts of repetitive DNA sequence that 
are found in multiple copies in the chromosome, and are 
located in intergenic regions. These repetitive sequences 
are rarely transcribed, but they may play important roles 
in DNA replication, in recombination between chromo- 
somes, or in regulating gene transcription. 


Bacterial Chromosome Compaction 


The chromosomes of bacteria are densely compacted 
into a series of tight loops, which makes the nucleoid, 
the region in which they are contained, remarkably small 
(Figure 11.3). If the 4.6 Mb of the E. coli chromosome were 
to be unpacked from the nucleoid and laid out along a 
ruler, it would measure about 1200 um, nearly 1000 times 
longer than the E. coli cell itself. To get a sense of this size 
difference, imagine trying to stuff a 62-foot-long thread 
into the kind of gelatin-based capsule you might take for 
allergies or a headache! 

How does E. coli package a chromosome 1000 times 
longer than itself and leave room for molecular activities 
such as replication, transcription, and translation? The 


Figure 11.3 The nucleoid of E. coli. Supercoiling condenses 
the E. coli chromosome, and proteins help organize it in the 
nucleoid region. 


Species Genome Size (in Mb) 
Mycoplasma genitalium 0.58 

Borrelia burgdorferi 1.4 
Haemophilus influenzae 1.83 

Vibrio cholerae 4.0 
Escherichia coli 4.2 
Agrobacterium tumefaciens 57 


Sinorhizobium meliloti 67 


Number of Chromosomes Chromosome Form(s) 


1 Circular 

2 One circular, one linear 

1 Circular 

2 Both circular 

1 Circular 

4 Three circular, one linear 
All circular 


answer is twofold. First, proteins help organize the chro- 
mosome into the loops that efficiently pack the nucle- 
oid, and second, the circular DNA of the chromosome 
undergoes additional, superhelical twisting known as DNA 
supercoiling. 

Bacterial DNA is associated with two major groups of 
proteins: small nucleoid-associated proteins and struc- 
tural maintenance of chromosomes (SMC) proteins. 
Several different proteins belong to the small nucleoid- 
associated group of proteins, and all appear to participate 
in DNA bending that contributes to folding and conden- 
sation of the chromosome. The small nucleoid-associated 
proteins whose functions are best characterized are H-NS 
protein and HU protein. Figure 11.4 illustrates a possible 
general arrangement for H-NS and HU in securing loops 
of chromosomal DNA within the nucleoid. It also shows 
that the role of the SMC proteins is to hold the DNA in 
coils, or perhaps in V-shaped configurations. In addition 
to HU, H-NS, and SMC proteins, other proteins interact 
in the nucleoid to compact DNA. The precise identity 
and individual roles of these proteins is still a subject of 
active investigation. 

The second mechanism facilitating chromosome com- 
paction in the nucleoid is DNA supercoiling, which twists 
the duplex around on itself much like the twisting of a 
rubber band. Covalently closed circular chromosomes like 
those of bacteria exist in various coiled forms. The least 
twisted form of these is the relaxed-circle form that can 
be visualized as an undistorted rubber band lying flat on a 
plane in an open O shape. When the DNA duplex is in its 
standard coiled form with approximately 10 base pairs per 
helical turn (see Figure 7.7), it is in a relaxed circle form. 
In contrast, DNA molecules can be compacted by super- 
coiling as a response to over- or under-rotation of helical 
twisting. A portion of a DNA molecule that has its helix 
over-rotated has approximately 12.5 bp per helical twist, 
and will exhibit positive supercoiling. In contrast, a helix 
that is under-rotated has approximately 8.3 bp per helical 
twist and will exhibit negative supercoiling. Over-rotated 


Smaller loops consisting 
of duplex DNA condensed 
by SMC proteins 


Average loop contains 
~40 kb DNA 


Loops secured at base 
by HU and H-NS 


Figure 11.4 Bacterial chromosome condensation by proteins. 
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and under-rotated DNA structures are unstable and are 
stabilized by supercoiling. 

Visualized by electron microscopy, supercoiled DNA 
looks something like a rubber band that as a result of ex- 
tensive twisting has become convoluted, overlaps itself, 
and will not lie flat on a plane. Multiple intermediate 
supercoiled forms occur in large circular chromosomes. 

DNA supercoiling and the relaxation of supercoil- 
ing are enzymatically controlled processes. DNA gyrase, 
also known as topoisomerase II, is responsible for in- 
troducing negative supercoiling. DNA gyrase contains 
four protein subunits that form two protein “jaws” that 
grasp the DNA duplex in different locations and twist the 
helix around itself to form negative supercoils. A second 
enzyme, topoisomerase I, is responsible for unwinding 
negative supercoils in four steps. Topoisomerase I first 
binds negatively supercoiled DNA and then catalyzes the 
breakage of one strand of the duplex. Remaining bound 
to the DNA, the enzyme then allows the broken strand to 
rotate around the intact strand to relieve tension. Lastly, it 
religates the broken strand. These same enzymes also oper- 
ate on linear bacterial DNA, which can also be supercoiled. 
Homologous enzymes are found in eukaryotic cells, where 
they perform similar tasks. 

Figure 11.5 shows supercoiling and the effect of 
topoisomerase I on highly supercoiled DNA. The 
electron micrographs in Figure 11.5a show two circu- 
lar chromosomes from the same bacterial species, one 
highly supercoiled and the other in a relaxed-circle struc- 
ture. Figure 11.5b shows gel electrophoresis results for 
highly supercoiled DNA after 5 minutes of exposure to 
topoisomerase I (lane 2) and 30 minutes of exposure 
to topoisomerase I (lane 3), using relaxed-circle DNA 
and highly supercoiled DNA (both without having been 
exposed to the enzyme) in lane 1 as controls. Notice in 
comparing lanes that with exposure to topoisomerase I 
and with the passage of time there is more relaxed-circle 
DNA (darker electrophoretic bands indicating more DNA) 
and less highly supercoiled DNA (lighter electrophoretic 
bands indicating less DNA). 

Negative supercoiling has a critical role in bacterial 
cells beyond its role in chromosome compaction. Negative 
supercoiling promotes DNA strand separation associated 
with DNA replication and transcription. As a consequence, 
the role of DNA gyrase in controlling negative supercoiling 
is of considerable interest in medical research as a potential 
target for drugs with antibacterial activity. Two categories of 
drugs—coumarins and quinolones—have broad inhibitory 
effects on bacterial topoisomerases, including DNA gyrase. 
These compounds do not affect eukaryotic topoisomerases, 
which are different enough from bacterial topoisomerases to 
avoid the inhibitory effects. The antibiotic compound cipro- 
floxacin (more commonly known as Cipro) is one example 
of a broad-spectrum antibiotic that inhibits bacterial DNA 
gyrase activity and thus inhibits the growth and reproduction 
of bacteria. 
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Figure 11.5 Circular DNA of bacteria in multiple forms. (a) Electron micrographs show 
supercoiled and relaxed-circle chromosomes. (b) The coiling of circular bacterial DNA determines its 
electrophoretic mobility. In lane 1, highly supercoiled DNA has a much higher electrophoretic mobility 
than the same DNA in a relaxed-circle state. In lane 2, 5 minutes of treatment with topoisomerase | to 
relax supercoiling produces many different coiled forms of the chromosome. In lane 3, 30 minutes of 
topoisomerase | treatment converts much of the DNA to relaxed circle. 


11.3 Eukaryotic Chromosomes Are 
Organized into Chromatin 


With regard to their number, structure, and organiza- 
tion, eukaryotic chromosomes differ from bacterial and 
archaeal chromosomes in numerous ways. For example, 
eukaryotes possess multiple chromosomes, which in dip- 
loids occur in homologous pairs. Also, the chromosomes 
are permanently localized to the nucleus, where replica- 
tion, transcription, and mRNA processing take place. In 
addition, eukaryotic chromosomes undergo cyclic con- 
densation for cell division. The total amount of DNA in 
eukaryotic genomes is tens to thousands of times greater 
than in bacterial or archaeal genomes. 

To manage the massive amount of DNA and the 
multiple chromosomes and need for periodic chromo- 
some condensation, eukaryotic chromosomes are orga- 
nized by a nucleoprotein complex known as chromatin 
that is a mixture of the DNA that makes up the chro- 
mosomes along with an array of proteins that organize 
and compact the DNA. In this section, we describe the 
organizational role of chromatin by identifying the essen- 
tial proteins that participate in this compaction process 
and looking at the mechanisms that promote it. There 
is, in addition, a second critical function for chromatin 
in eukaryotes that we take up in the following section: 
the generation of different chromatin states that vary in 
their degree of chromosome compaction and participate 


in the regulation of gene expression by controlling access 
of transcription-initiating proteins to regulatory DNA se- 
quences. A more detailed discussion of chromatin func- 
tion in regulating eukaryotic gene expression is then 
presented in Chapter 15. 


Chromatin Compaction 


Why is chromosome compaction by chromatin impor- 
tant? Simply stated, eukaryotic chromosomes would not 
fit into the nucleus without compaction, and chromosome 
segregation during cell division would be impossible. Each 
one of your chromosomes contains one long DNA double 
helix that is incorporated with large amounts of protein 
into the complex known as chromatin. Each of your so- 
matic cell nuclei contains more than 6 billion base pairs of 
DNA divided among 46 chromosomes, and all that DNA 
fits in the nucleus and still allows space for DNA replica- 
tion, transcription, and mRNA processing, thanks to a re- 
markable feat of biomolecular engineering brought about 
by chromatin. If all the chromosomes were taken from 
one of your somatic cell nuclei and the 46 chromosomes 
were stripped of their proteins and unwound to a relaxed 
state, the DNA molecules laid end to end would span 1.8 
meters—nearly 6 feet. This is more than 260,000 times 
the diameter of the nucleus! The DNA from your shortest 
chromosome alone would be almost 15,000 times longer 
than the nuclear diameter. Returning to the analogy of 
the medicinal capsule mentioned in the previous section 
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in connection with the E. coli chromosome, a capsule 
representing a human nucleus would contain 46 pieces of 
thread, representing the 46 human chromosomes, with a 
combined length of 625 feet. 


Histone Proteins and Nucleosomes 


By weight, each eukaryotic chromosome is approximately 
half DNA and half proteins, and about one-half of the 
protein content of chromatin is histone protein. The 
histones are five small, basic proteins that are positively 
charged and bind tightly to negatively charged DNA. 
Equally abundant, but more diverse, is an array of hun- 
dreds of types of other DNA-binding proteins named, by 
default, nonhistone proteins. This large array of proteins 
performs a variety of tasks in the nucleus, not all of which 
are defined. 

The five types of histone proteins in chromatin are 
designated H1, H2A, H2B, H3, and H4 (Table 11.3). H1 
is the largest and most variable histone protein, contain- 
ing 215 to 244 amino acids, depending on the species. The 
other four histones are considerably smaller and more uni- 
form in size, containing between 102 and 129 amino acids. 

Among eukaryotes, there is very strong evolution- 
ary conservation of the amino acid sequences of histone 
proteins. This consistency among eukaryotes suggests 
that there is significant evolutionary pressure to retain the 
structure and function of each histone protein. A com- 
parison of the amino acid sequences of H4 in cows and 
pea plants, for example, demonstrates this high degree of 
evolutionarily retained identity. Cows and pea plants last 
shared a common ancestor more than 500 million years 
ago, when the animal and land plant lineages diverged. 
Over those hundreds of millions of years of evolutionary 
change, there are just two amino acid differences among 
the 102 amino acids in the protein. The comparison tells 
us that since the time when plants and animals last shared 
a common ancestor, extraordinarily strong evolutionary 
pressure has maintained H4 DNA and its amino acid se- 
quence identity in organisms. This example of evolution- 
ary conservation speaks to the importance of histones in 
eukaryotic chromosome organization. 


Table 11.3 Histone Protein Characteristics 
Basic/Acidic Number 

Ratioof Amino Molecular of Amino 

Histone” Acids Weight (D) Acids Location 

H1 5.4 23,000 224 Linker DNA 

H2A 1.4 13,960 129 Nucleosome 

H2B ley 13,774 125 Nucleosome 

H3 1.8 15,273 135 Nucleosome 

H4 2.5 11,236 102 Nucleosome 


7 Histone proteins from calf thymus gland. 


Histones are the principal agents in chromatin 
packaging, and the fundamental unit of histone pro- 
tein organization is the nucleosome core particle. The 
nucleosome core particle is a heterooctameric protein com- 
plex that contains two molecules each of four histones— 
H2A, H2B, H3, and H4 (Foundation Figure 11.6). These 
proteins are continuously transcribed and translated in 
eukaryotic cells, and histone genes are one family of genes 
that are present in multiple copies in eukaryotic genomes. 

Nucleosome core particles self-assemble. The his- 
tone proteins first self-assemble into dimers containing 
two different histones each: H2A-H2B dimers contain 
one molecule each of histone 2A and histone 2B, and 
H3-H4 dimers contain one molecule each of histone 3 
and histone 4. Current evidence indicates that nucleo- 
some core particles are formed in steps that begin with 
two H3-H4 dimers assembling to form a histone tet- 
ramer. The tetramer is then joined by two H2A-H2B 
dimers to form the octameric nucleosome core particle. 

Nucleosome core particles are flat-ended structures 
approximately 11 nm in diameter by 5.7 nm thick (see 
Figure 11.6a). Each nucleosome core particle is wrapped 
by approximately 146 base pairs of DNA that twist one 
and two-thirds turns around the core particle. This wrap- 
ping is the first level of DNA condensation, and it con- 
denses the DNA approximately sevenfold. 

The 146 bp of DNA wrapped around a nucleosome 
core particle is called core DNA, and the combination of a 
nucleosome core particle wrapped with core DNA is iden- 
tified as a nucleosome. Electron micrographs of chromatin 
fibers in a highly decondensed state show a regular series of 
circular structures strung together by connecting filaments 
(see Figure 11.6b). This form of chromatin is identified 
as the “beads on a string” morphology of chromatin. The 
“beads” are nucleosomes that are a little more than 11 nm 
in diameter, and the “string” is called linker DNA. Linker 
DNA is the DNA between regions of core DNA. 

The length of linker DNA segments varies among or- 
ganisms, although in each species it is a consistent length, 
and, thus, nucleosomes occur at regular intervals. In the 
yeast Saccharomyces cerevisiae, linker DNA is 13 to 18 bp 
in length. Linker DNA is about 35 bp long in the fruit fly 
Drosophila. In humans and other mammals, linker DNA 
spans about 40 to 50 bp; in sea urchins, linker DNA is very 
long—approximately 110 bp. If the 146 bp in length of core 
DNA is added to the length of linker DNA, the nucleosome 
repeat distance of the beads-on-a-string structure is ap- 
proximately 160 to 260 bp. This beads-on-a-string form of 
chromatin is identified as the 10-nm fiber, since the diam- 
eter of nucleosomes is approximately 10 nm. 

This nucleosome-based model of chromatin was pro- 
posed by Roger Kornberg in 1974. Kornberg based his 
model on biochemical observations that chromatin con- 
tains a ratio of one molecule of each of the four core histone 
proteins (H2, H2A, H3, and H4) to each 100 base pairs and 
one molecule of the histone H1 to each 200 base pairs. 


The hierarchy of chromatin organization and chromosome condensation. 
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Structural protein—imaging described momentarily 
supported Kornberg’s model, but the molecular proof 
of the model’s validity came from research by Markus 
Noll who treated eukaryotic chromatin with differ- 
ent concentrations of the enzyme DNasel to cut DNA 
where it is not protected by bound proteins. Recall from 
Research Technique 8.1 (pp. 279-280) and discussion 
in Section 8.3 in connection with DNA footprint- 
protection analysis that DNasel cuts DNA that is not 
protein-protected but is unable to cut DNA in regions 
bound by protein. Noll’s most important result was 
obtained by mixing mammalian chromatin with a high 
concentration of DNasel and using gel electrophoresis 
to determine that the length of DNA fragments pro- 
duced by DNasel digestion measured approximately 
200 bp in length. This is precisely the length Kornberg 
predicted, as it is the sum of the approximately 145 bp 
of DNA wrapping a nucleosome core particle and the 
55 bp of linked DNA between nucleosomes. 

Kornberg’s model was supported by structural pro- 
tein studies, X-ray diffraction imaging, and cryogenic 
electron microscopy (cryo-EM). The latter has produced 
detailed images of nucleosome structure and revealed 
the likely points of interaction between the octameric 
nucleosome core particle and core DNA. Timothy 
Richmond and his colleagues have described the crystal 
structure of the nucleosome using cryo-EM at 2.8-A 
resolution (Figure 11.7). Richmond’s analysis indicates 
that there are 1.65 turns of core DNA around each nu- 
cleosome core particle. The analysis identifies additional 
molecular interactions between the N-terminal (amino 
terminal) tails of histone proteins and core and linker 
DNA. These interactions are critically important to the 
type of chromatin structure present in regions of eu- 
karyotic chromosomes. Different chromatin states play 
major roles in the regulation of eukaryotic gene expres- 
sion, as we discuss in Chapter 15. 


Three-quarter view H2B 


Side view 


The 10-nm fiber is an unnatural state for chromatin. To 
achieve it, chromatin must be chemically treated and held in 
conditions that are not found in cells. Under normal cellular 
conditions, chromatin forms the 30-nm fiber, which 
is six times more condensed than the 10-nm fiber (see 
Figure 11.6c). Electron micrographs and molecular model- 
ing help us visualize how the 30-nm fiber is assembled. If we 
consider the 10-nm fiber to be a kind of primary structure 
for chromatin, then the 30-nm fiber is a secondary struc- 
ture. It is produced by coalescence of the 10-nm fiber into 
a cylindrical filament of coiled nucleosomes that is hollow 
in the middle. Due to its coiled structure and open middle, 
the 30-nm fiber is often also called the solenoid structure 
(like the coil of wire in the starter of a car). Each turn of the 
solenoid structure contains six to eight nucleosomes. The 
diameter of the solenoid is approximately 34 nm. 

The histone protein H1 plays a key role in stabilizing 
the solenoid structure. The long N-terminal and C-terminal 
ends of the H1 protein attach to adjacent nucleosome 
core particles. H1 protein pulls the nucleosomes into an 
orderly solenoid array and lines the inside of the structure. 
Experimental analysis shows that chromatin from which H1 
has been removed can form 10-nm fibers but not 30-nm fi- 
bers. Chromatin exists in a 30-nm-fiber state or a more con- 
densed state during interphase. Genetic Analysis 11.1 guides 
you through an interpretation of chromatin organization. 


Higher Order Chromatin Organization 
and Chromosome Structure 


Beyond the 30-nm stage, chromatin compaction and 
the presence of nonhistone proteins are integral to the 
structure of chromosomes and the process of chro- 
mosome condensation that initiates with the onset of 
prophase in the M phase of the cell cycle. Nonhistone 
proteins perform multiple roles in influencing chromo- 
some structure and in facilitating M phase chromosome 


Figure 11.7 Nucleosome 
structure. A computer-generated 
rendering of the X-ray crystal 
structure of the nucleosome at 
2.8-A resolution by cryo-electron 
microscopy shows the eight 
histone protein molecules in the 
color-coded nucleosome core 
particle. DNA wraps one and 
two-thirds turns around the core 
particle, a span of approximately 
146 bp. 
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condensation. Interphase chromosome structure results 
from the formation of looped domains of chromatin 
similar to supercoiled bacterial DNA (see Figure 11.6d). 
The loops are variable in size, containing from tens to 
hundreds of kilobase pairs and consisting of 30-nm-fiber 
DNA looped on a category of nonhistone proteins that 
are the foundation of chromosome shape. The diameter 
of looped chromatin is approximately 300 nm, so looped 
chromatin is called the 300-nm fiber. With contin- 
ued condensation, the chromatin loops form the sister 
chromatids. In metaphase, chromosome condensation 
reaches its zenith, resulting in chromosomes that are 
easily visualized by microscopy (see Figure 11.6e). 

The chromosome scaffold is a filamentous nonhis- 
tone protein framework that gives chromosomes their 
shape. This scaffold is in some ways like the steel super- 
structure that provides the shape, strength, and support 
for a building. Figure 11.8a shows a fully condensed 
chromosome at metaphase, and Figure 11.8b shows the 
protein scaffold of a metaphase chromosome after being 
stripped of DNA. The shape of the chromosome scaf- 
fold is clearly reminiscent of the metaphase chromosome 
structure, consisting of sister chromatids joined at the 
centromere, which is visible as a constriction near the 
midpoint of the scaffold. The stringy material surround- 
ing the scaffold is DNA. 

Chromatin loops containing 20,000 to 100,000 bp are 
anchored to the chromosome scaffold by other nonhis- 
tone proteins at sites called matrix attachment regions 
(MARs) (Figure 11.9). The radial loop—scaffold model 
predicts that the chromatin loops gather into rosette- 
like structures and are further compressed by nonhistone 
proteins. The total compaction of chromatin achieved by 
metaphase is approximately a 250-fold compaction of the 
already condensed 300-nm fiber. 

Higher order chromosome condensation plays a 
critical role in two distinctive features of eukaryotic 
genetics. First, the general process of chromosome con- 
densation compacts chromosomes to a degree that allows 
them to be efficiently separated at anaphase. Second, 


Figure 11.8 The chromo- 
some scaffold of a metaphase 
chromosome. (a) A metaphase 
chromosome. (b) Stripped 

of chromatin, the chromo- 

some scaffold is composed of 
nonhistone proteins that form 

a superstructure to anchor DNA 
loops and gives the chromosome 
its shape. 


the chromatin loops formed during condensation play 
a role in regulating gene expression. Recent analysis 
of DNA binding to the chromosome scaffold indicates 
that certain repetitive DNA sequences are common at 
MARs. These sequences, called ATC sequences, are rich in 
A-T base pairs and have a high concentration of C in one 
strand. ATC sequences are found throughout the genome. 
Consequently, they can attach to the MARs in different 
patterns in different tissues. Experimental evidence in- 
dicates that active transcription takes place in chromatin 
loops, particularly in segments of loops that are distant 
from MARs. Thus, larger loops tend to have more active 
transcription than small loops. 

The positioning of ATC sequences throughout the 
genome appears to play a role in cell-type-dependent 
patterns of chromatin looping in given chromosomes 
that can lead to expression of certain genes in one type 
of cell but not in another. For example, if gene A is des- 
ignated for expression in a certain type of cell but gene 
B is not, gene A will be found far away from an MAR, 
whereas gene B will be close to an MAR. The molecular 
details of this model are clearer for single-celled eukary- 
otes than for mammals, but it appears that the position 
of a gene within the nucleus is a factor in its transcrip- 
tion. We discuss this observation in more detail in 
Chapter 15. 


Nucleosome Distribution and Synthesis 
during Replication 


Our discussion of DNA replication in Chapter 7 de- 
scribed the enzymatic processes necessary for the syn- 
thesis of new daughter DNA strands. This process dou- 
bles the total amount of DNA in a nucleus and results 
in each chromosome containing two identical sister 
chromatids. All of this newly synthesized DNA must 
be organized by nucleosome core particles. Having de- 
scribed the structure and function of nucleosomes in 
chromatin, we now take a moment to describe the 
process of managing existing nucleosome core particles 
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Figure 11.9 The radial loop-scaffold model of chromatin 
condensation. @ Chromatin is anchored at matrix attachment 
regions (MARs). @ Nonhistone proteins organize chromatin 
loops into rosettes. @ Rosettes are compressed in metaphase 
chromosomes. 


during replication and the process of adding these and 
new nucleosome core particles to DNA after the replica- 
tion fork passes. 

The ubiquitous presence of nucleosomes raises several 
questions about their management and synthesis in con- 
nection to DNA replication. Are old nucleosomes recycled 
during replication? Are new nucleosome proteins synthe- 
sized during replication? Do old nucleosome core particles 
remain intact, so that nucleosomes are composed of either 
old histone proteins or newly synthesized histone proteins, 
or are old and newly synthesized histone protein mixed? And 
how are nucleosome core particles, whatever their composi- 
tion, distributed to the sister chromatids during replication? 

Experimental research has answered these questions. 
Evidence collected by numerous investigators finds that 
the assembly of nucleosome core particles in connection 
with replication is driven by the partial denaturing of old 
core particles into either dimers or tetramers. These old 
core particle components are randomly joined with other 
dimers and tetramers after replication to form com- 
plete nucleosome core particles. There is a great deal of 
new histone protein synthesis during DNA replication, 
and the newly synthesized proteins form dimers and 
tetramers. This mixture of old and newly synthesized 
core particle components is the pool from which post- 
replication nucleosome core particles are assembled. The 
experimental evidence indicates that most nucleosome 
core particles present after replication are a mixture of 
some old nucleosome core particle dimers or tetramers 
and some newly synthesized core particle dimers or 
tetramers. In addition, a few histone core particles are 
composed of entirely newly synthesized histone proteins, 
and some are composed of entirely old core particle 
components. 

The current model proposes that as the replica- 
tion fork passes, nucleosomes break down into protein 
subassemblies—specifically, _H3-H4 tetramers and 
H2A-H2B dimers. The H3-H4 tetramers immediately 
reaffiliate, more or less at random, with one of the sis- 
ter chromatid products of replication. In contrast, many 
H2A-H2B dimers apparently become disassembled into 
individual histone proteins and then quickly reform into 
dimers with either old or newly synthesized protein 
partners. 

Enough new synthesis of all four proteins takes place 
to double the number of nucleosomes. In this process, 
new H2A-H2B dimers and H3-H4 tetramers assemble. 
Some new H2A-H2B dimers join old H3-H4 tetramers 
already on DNA, while other new H2A-H2B dimers join 
new H3-H4 tetramers to form nucleosomes. Thus, about 
half of the nucleosomes assembled during replication 
are composed of old H3-H4 tetramers that are randomly 
distributed to the sister chromatids and combined with 


GENETIC ANALYSIS 


PROBLEM The plant species Arabidopsis thaliana has a genome containing approximately 100 million 
bp of DNA. For this problem, assume Arabidopsis has a core-DNA length of 145 bp and a linker-DNA 


length of 55 bp. ee eran IT DOWN: The nucleosome is wrapped by core a) 
: F F dth bet l ist of linker DNA (p. 371). 
a. Determine the approximate number of nucleosomes in each nucleus, “as Pere eee 


b. Determine approximately how many molecules of histone protein H4 are found in each nucleus. 


heterooctamers containing two molecules each of four 
histone proteins (p. 371). 


Solution Strategies Solution Steps 


‘en IT DOWN: Histone core particles are 


Evaluate 

1. Identify the topic this problem addresses 1. This problem asks about the number of nucleosomes per nucleus and about 
and the nature of the required answer. the histone composition of nucleosomes. The answer requires approximate 

numbers of nucleosomes and of histone H4 molecules per nucleus. 

2. Identify the critical information given in 2. The approximate genome size of A. thaliana is given in base pairs, as are the 
the problem. lengths of its core and linker DNA. 

Deduce 

3. Describe the number of DNA base pairs 3. The core DNA wrapping a nucleosome is 145 bp in length. Linker DNA 
that wrap around each nucleosome, and between nucleosomes is approximately 55 bp in length. In total, there is 


state the approximate number in the one nucleosome for about every 200 bp of DNA. 


span between nucleosomes. | TIP: The combined length of core plus 
linker DNA affiliated with each nucleo- 
some is 145 bp + 55 bp = 200 bp 


4. Describe nucleosome composition. 4. Nucleosomes are octamers of histone protein consisting of two molecules 
each of H2A, H2B, H3, and H4. 


Solve Answer a 
5. Calculate the number of nucleosomes in 5. If we estimate that a new nucleosome associates with DNA about every 
each A. thaliana nucleus. 200 bp, the approximate number of nucleosomes per nucleus is 


1 X 10° nucleotides/nucleus 


= - = 5 X 10° nucleosomes/nucleus 
2 X 10° nucleotides/nucleosome 


Answer b 
6. Calculate the number of molecules of 6. There are 2 H4 molecules per nucleosome, thus (2)(5 x 10°) = 10, or 1 million 
H4 in the nucleosomes of an Arabidopsis H4 molecules, per nucleus. 


nucleus. 


For more practice, see Problems 4, 7, and 18. Visit the Study Area to access study tools. MasteringGenetics™ 


either new or old H2A-H2B dimers. The remaining nu- Chromosome Shape and Chromosome 
cleosomes contain new H3-H4 and either new or old Karyotypes 


H2A-H2B components (Figure 11.10). 
During prophase of the cell cycle, chromosome conden- 


sation prepares the chromosomes for sister chromatid 


11.4 Chromatin Compaction Varies segregation. As chromosome condensation reaches its 


zenith in late prophase, the sister chromatids become 
along the Chromosome individually visible with the aid of microscopy, and each 


chromosome takes on a characteristic shape. Condensed 
chromosomes are divided by their centromere into seg- 
ments known as chromosome arms that are almost in- 
variably of unequal lengths. 

One chromosome arm, called the short arm, also 
known as the p arm, is shorter than the other arm that 


In the previous section, we described the role of chromatin 
in chromosome compaction. In this section, we discuss 
differences in chromosome compaction along chromo- 
somes, consider the consequences of this variability for 
visualizing chromosome structure, and take a first look at 
the functional consequence of variation in chromatin state 
for differential gene transcription. 
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Figure 11.10 Nucleosome inheritance after DNA 
replication. Following the passage of the replication fork, 
“old” H3-H4 tetramers are randomly assigned to daughter 
strands, and newly synthesized H3-H4 tetramers inhabit 
strands not bound by old tetramers. Old and new 
H2A-H2B dimers join the tetramers to form complete 
nucleosomes. 
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Figure 11.11 Chromosome shape. The position of the cen- 
tromere and the ratio of the lengths of the long arm (q arm) and 
short arm (p arm) at metaphase determine chromosome shape. 


is known as the long arm, or the q arm (Figure 11.11). 
The position of the centromere determines the relative 
lengths of the short and long arms, leading to descrip- 
tive terms for the shapes of metaphase chromosomes. A 
metacentric chromosome has a more or less centrally lo- 
cated centromere and chromosome arms of similar lengths. 
Submetacentric chromosomes have a centromere nearer 
one end, producing one arm that is distinctly shorter than 
the other. The centromere of acrocentric chromosomes 
is nearly at the end of the chromosome. The “short arm” of 
acrocentric chromosomes is often composed of highly re- 
petitive DNA. These repetitive regions are known as “satel- 
lites” in part because secondary chromosome constrictions 
appear to partially pinch off the repetitive segment of the 
short arm. Telocentric chromosomes have a terminal cen- 
tromere and no short arm. 

Chromosome number differs among species, but each 
species has a characteristic chromosome number. In eu- 
karyotes, like humans, the karyotype is a visual display of 
chromosomes seen by microscopy. A karyotype displays 
all the chromosomes in a nucleus. In the case of a human 
karyotype, it contains 22 pairs of autosomes and one pair of 
sex chromosomes. The human karyotype is arranged and 
numbered with the largest autosomal pair as chromosome 1 
and the rest of the autosomes following in order of descend- 
ing length. The sex chromosomes are identified separately. 

The chromosomes in a karyotype may be stained with 
various dyes to produce the chromosome banding pattern 
that is distinct for each pair or type of chromosomes in the 
set. A normal human male karyotype contains 22 pairs of 
autosomes (numbered 1 through 22) and one X and one 
Y chromosome, and a normal human female karyotype 
contains 22 autosomal pairs of chromosomes along with a 
pair of X chromosomes. 


In Situ Hybridization 


The contemporary approach to examining chromosome 
number, structure, and genetic content is the use of 
in situ hybridization methods. These methods visual- 
ize karyotypes through the use of chromosome-specific 
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Figure 11.12 A human karyotype. With distinct fluoro- 
phores labeling 24 chromosome-specific FISH probes, this normal 
human male karyotype displays a different color pattern for each 
chromosome. Autosomal pairs are numbered 1 to 22, and the X 
and Y chromosomes are labeled. 


molecular probes that are tagged with fluorescent com- 
pounds to facilitate detection (Figure 11.12). Using mi- 
croscopy and computer-enhanced imaging, these meth- 
ods allow chromosome inspection with great precision. 

The karyotype in Figure 11.12 uses molecular probes 
that are specific to each of the 24 chromosomes in a hu- 
man male karyotype (22 autosomes, an X chromosome, 
and a Y chromosome) pictured in the karyotype. Each of 
the chromosome-specific molecular probes is tagged with 
a different fluorescent compound. When excited, each of 
these compounds emits light of a different wavelength, 
allowing a computer-driven photoreceptor to capture the 
emissions and convert them into an image with the differ- 
ent colors seen in the karyotype. 

The hybridization of the chromosome-specific mo- 
lecular probes is similar to the hybridization of the probes 
described in Research Technique 10.2. Rather than being 
specific to a single gene, however, these probes label an en- 
tire chromosome. Furthermore, unlike the preparation for 
gel electrophoresis and Southern blot methods described in 
Chapter 10, chromosomal DNA need not be fragmented for 
a chromosome-specific target sequence to be detected by 
the molecular probe. Instead, the chromosomes are fixed on 
a microscope slide, the DNA is denatured (i.e., separated 
into single strands), and the probe is applied. This technique 
is known as in situ hybridization because, unlike other 
hybridization methods, it labels intact chromosomes. 

The first generation of in situ hybridization methods 
used radioactive nucleic acid probes that produced au- 
toradiographs when a small piece of photographic film 
was placed on top of a chromosome spread on a micro- 
scope slide. Decay of **P radioactive label in the probe 
exposed the photographic film, which was then developed 
in the same way as an autoradiograph of an electropho- 
resis gel. In chromosome autoradiographs, dark regions 


corresponded to the chromosome locations of a DNA 
sequence hybridized by the probe. 

Today, most in situ hybridization applications use flu- 
orescent compounds, commonly known as fluorophores, 
to label molecular probes. This is known as fluorescent 
in situ hybridization (FISH). Using FISH, fluorophores 
can be attached to chromosome-specific probes that label 
certain chromosome sequences but not others or to gene- 
specific molecular probes. Figure 11.13a illustrates the use 
of two gene-specific FISH probes, one with a fluorophore 
producing red color and the other with a fluorophore 
producing green color, to identify two genes on the same 
human chromosome. Figure 11.13b utilizes multiple FISH 
probes to individually label each human chromosome. In 


(a) 


(b) 


Figure 11.13 Fluorescent in situ hybridization (FISH). 

(a) Two FISH probes hybridizing with target sequences ona 
human chromosome are detected by production of differently 
colored fluorophore emissions. (b) Multiple probes and fluores- 
cent compounds make each chromosome distinctive. 


this instance, the probes are segments of chromosomes 
that differ in their sequenced content. The probes are 
labeled with distinct fluorophores, leading to some chro- 
mosomes having multiple colors in the image. 


Imaging Chromosome Territory 
during Interphase 


Early observers of chromosomes in the nucleus, including 
Edmund Wilson, Walter Sutton, and Theodore Boveri, 
hypothesized that chromosomes contained the genetic 
material and noticed that interphase chromosomes are not 
uniformly arrayed within a nucleus. They suggested that 
this variation might be related to chromosome activity. 
Recent research using FISH techniques to study chromo- 
some positioning in the interphase nucleus indicates that 
these early suggestions are valid. 

Cell biologists Thomas Cremer and Christoph Cremer 
have used FISH methods to investigate the arrangement of 
chromosomes in the nucleus during interphase and found 
that chromosomes are partitioned into their own chromo- 
some territories (see the chapter-opening photo (p. 365) 
and Figure 11.14. A chromosome territory is a small 
region of the nucleus that is the domain of a single chro- 
mosome. It is not bounded by any sort of membrane, nor 
is it demarcated in any distinctive manner. Chromosomes 
do not occupy exactly the same territory in each nucleus 
(the nucleus does not have reserved seating for each chro- 
mosome), but once confined to a territory, a chromosome 
does not stray from it until the initiation of M phase of the 
cell cycle. Chromosomes are, however, dynamically active 
within their territories during interphase and can be seen 
to move, twist, and turn during transcription and DNA 
replication. The chromosomes appear to be anchored 
by their centromeres and perhaps to take positions that 
allow, for each chromosome, characteristic patterns of 
gene expression and other activities during interphase. 
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Figure 11.14 Chromosome territories in the eukaryotic 
nucleus. Chromosomes occupy discrete territories separated by 
interchromosome domains during interphase of the cell cycle. 
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Adjacent chromosome territories are separated by an 
interchromosomal domain that contains no chromatin. 
These domains are channels for the movement of pro- 
teins, enzymes, and RNA molecules within the nucleus 
and among chromosome territories. The distribution of 
chromosome territories places the largest and most gene- 
rich chromosomes toward the center of the nucleus, while 
the territories of smaller chromosomes containing fewer 
genes are located toward the outer edges of the nucleus. 

The positioning of a chromosome within its terri- 
tory corresponds to the activities in which the parts of 
the chromosome are engaged at particular stages of inter- 
phase. For example, chromosome regions that replicate 
early in S phase are generally found further away from the 
nuclear membrane. The regions closer to the center of the 
nucleus are the locales of so-called early-replicating chro- 
mosome segments. In contrast, late-replicating chromo- 
some segments, portions of chromosomes that replicate 
late in S phase, are found nearer to the nuclear mem- 
brane. Also, the most transcriptionally active chromo- 
some regions are found closest to the border between a 
chromosome territory and an interchromosomal domain, 
presumably because of (1) greater access to proteins and 
enzymes needed for transcription and (2) faster dispersal 
of RNA transcripts after transcription is completed. While 
transcription occurs throughout each chromosome terri- 
tory, experimental evidence suggests that transcription is 
most intense bordering on interchromosomal domains. 

Recently, C. Anthony Blau and several colleagues 
have extended the Cremers’s findings by developing a 
three-dimensional model of the 16 chromosomes in yeast 
haploid nuclei (Figure 11.15). Employing a method that 
differentially identifies each chromosome, the researchers 
were able to precisely map the location of each chromo- 
some within the nucleus. The resulting three-dimensional 
map of chromosome positioning reveals that chromo- 
some centromeres are clustered together and that the 
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Figure 11.15 A three-dimensional model of chromosomes 
in the yeast nucleus. Yeast-chromosome centromeres are 
clustered toward one end of the nucleus; chromosome arms 
radiate from the centromere cluster. 
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chromosome arms project away from the centromere 
cluster. Knowledge of the positioning of chromosomes 
within the nucleus will make it possible to determine how 
DNA sequences influence chromosome positioning and, 
in turn, how chromosome positioning influences the tran- 
scription and replication of sequences. 

FISH techniques have numerous applications in the 
analysis of chromosomes in humans and other species. 
One important and practical use of these methods is the 
identification of the complex chromosome rearrange- 
ments often found in cancer cells, which we discuss in the 
Case Study that ends the chapter. 


Chromosome Banding 


Chromosome condensation, driven by chromatin compac- 
tion, reaches its maximum at the end of metaphase, when 
chromosomes are in their most condensed state. Using 
chromosome staining methods and microscopy, cytogeneti- 
cists can distinguish each chromosome by its overall size and 
shape and by the patterns of light and dark chromosome 
banding that are produced along the length of chromo- 
somes by treatment with specific dyes and stains. These are 
the methods that were originally used to produce karyo- 
types, and their legacy is essential to both basic chromosome 
nomenclature and to the foundations of our understanding 
of the role of chromatin state in gene expression. 

During the late 1960s and early 1970s, several tech- 
niques for chromosome banding were developed, primarily 
by experimentation with human and other mammalian 
chromosomes. Chromosome banding allows cytogeneticists 
to accurately identify each chromosome and chromosome 
segment in a karyotype according to internationally agreed 
upon standard banding patterns for each chromosome. 

Generating a karyotype and banding the chromo- 
somes is a multistep process that begins with the growing 
of cells in culture followed by the use of a chemical treat- 
ment to stop the cell cycle in, or just before, metaphase. 
Chemically induced cell cycle arrest maximizes the num- 
ber of cells in the culture containing well-condensed chro- 
mosomes. Individual cells from the arrested cell culture 
are then dropped onto a microscope slide. This bursts the 
cells and ruptures the nuclear membrane, allowing the 
chromosomes to spill out. After some additional treat- 
ment, any one of several different dyes or stains can be 
used on the chromosomes to reveal regional differences 
in chromatin compaction that produce a series of alter- 
nating chromosome bands. Banded chromosomes can be 
examined using microscopy, and the banded chromosome 
spreads are often photographed for karyotyping. 

The chromosome banding patterns produced by 
different stains and dyes correlate with one another. An 
international symposium in Paris, France, was convened 
in 1971 to agree on the standard banding pattern for each 
human chromosome as well as on a standardized nomencla- 
ture for identifying chromosome banding patterns based on 
karyotypes of metaphase chromosomes. This nomenclature 


remains in use today to ensure accuracy in identifying each 
chromosome and in describing any chromosome variants 
or abnormalities. The standardized banding is based on the 
highly reproducible patterns of some 300 or so lightly and 
darkly stained bands in chromosome-specific patterns seen 
on human chromosomes. The banding method is known 
as G (Giemsa) banding, and it is named after the staining 
compound called Giemsa stain that is used to generate the 
chromosome bands. 

The standardized G banding nomenclature uses let- 
ters and numbers to identify the major and minor band 
regions of each chromosome. The numbering begins 
at each chromosome centromere and progresses out- 
ward along each arm toward the telomere (Figure 11.16). 
Major regions are subdivided to permit a designation for 
each light- and dark-band region of a chromosome. Each 
band is given a designation that specifies the chromo- 
some number, chromosome arm, and band location. An 


Standard banding patterns and landmark designations for human 
chromosomes 1 through 5 
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Figure 11.16 Standardized human chromosome banding 
patterns. Human chromosomes 1 to 5 in late prophase. 
Heterochromatic regions are shown as gray and black bands, 
euchromatic regions as white bands. 


example is 5q2.3.1, which is the dark band on the long 
arm of chromosome 5 indicated in Figure 11.16. 

Chromosome banding by G banding and other tech- 
niques was at one time limited to chromosomes in meta- 
phase. Recently, however, advanced techniques have 
allowed cytogeneticists to stain chromosomes earlier 
in the cell cycle. Chromosome banding in prometa- 
phase chromosome spreads produces as many as 2000 
chromosome bands. Like the bands seen in metaphase 
chromosomes, these bands are highly reproducible, and 
chromosome-specific prometaphase banding patterns 
are now standardized. We discuss more about the appli- 
cations of chromosome banding in Chapter 13. 


Heterochromatin and Euchromatin 


Each chromosome band, whether in a metaphase chro- 
mosome spread or a prometaphase spread, contains many 
chromatin loops, thus holding between 1 million and 10 
million base pairs of DNA. Multiple genes can be con- 
tained in each chromosome band. 

The basis of chromosome banding is chromatin state. 
Chromatin condensation varies throughout the cell cycle 
but also varies from one part of a chromosome to another. 
G banding and other chromosome banding methods de- 
tect these differences in chromatin compaction by their 
ability to differentially stain regions of greater or lesser 
chromatin compaction. 

There is clear evidence that chromatin state is directly 
related to the ability of transcriptionally active proteins to 
initiate gene transcription. This means that chromosome 
banding patterns are associated with the distribution of 
expressed genes. During interphase, chromosome regions 
containing genes that are actively expressed generally have 
a lesser degree of chromatin condensation than chromo- 
some regions that do not contain expressed genes. These 
regions of active expression are identified as euchromatin, 
or as euchromatic regions. Most expressed genes are 
located in euchromatic regions, where condensation 
is variable during the cell cycle. Euchromatic chromo- 
some regions are lightly staining regions of G-banded 
chromosomes. Conversely, chromosome regions in 
which chromatin is tightly condensed are said to con- 
tain heterochromatin and are called heterochromatic 
regions. Heterochromatic regions contain many fewer 
expressed genes than do euchromatic regions. With fewer 
expressed gene sequences, heterochromatic DNA is more 
likely than euchromatic DNA to contain repetitive DNA 
sequences. Heterochromatin is identified as darkly stain- 
ing chromosome regions in G-banded chromosomes. 

Two distinct classes of heterochromatin are detected. 
Facultative heterochromatin exhibits variable levels of 
condensation. At times, facultative heterochromatin is 
highly condensed, while at other times it is less so. The 
transcription of genes in regions of facultative hetero- 
chromatin usually correlates with periods of less compac- 
tion. Constitutive heterochromatin, on the other hand, 


11.4 Chromatin Compaction Varies along the Chromosome 381 


is in a permanent heterochromatic state and contains 
very few expressed genes. Constitutive heterochromatin 
is predominantly composed of repetitive DNA sequences. 
It is particularly prominent in chromosome telomeres and 
in the centromeric regions of chromosomes, and, corre- 
spondingly, neither telomeric nor centromeric constitutive 
heterochromatin contains expressed genes. 

Genetic Analysis 11.2 gives you practice with these 
concepts as you interpret the results of a hypothetical 
experiment involving the use of FISH probes that have 
unknown sequence targets within chromosomes. 


Centromere Structure 


The observation that expressed genes are common in euchro- 
matic regions and uncommon in heterochromatic regions 
suggested to researchers that there might be a connection be- 
tween chromatin state and gene expression. Understanding 
of the connection between chromatin state and gene expres- 
sion came initially from studies of an unusual circumstance 
in the fruit fly Drosophila involving a chromosome transloca- 
tion and heterochromatic DNA near the centromere. 

Centromeres are specialized DNA sequence regions 
that are not found elsewhere in the genome. Centromeres 
bind kinetochore proteins and spindle fiber microtubules 
and in this way play an essential role in the division of ho- 
mologous chromosomes and sister chromatids during cell 
division (see Figures 3.4 and 3.6). 

In the early 1980s, John Carbon and Louis Clarke 
described centromeric DNA, or CEN sequences, in the 
yeast Saccharomyces cerevisiae with an analysis of the 
sequences of 16 yeast centromeres. Each centromere was 
found to have a slightly different CEN sequence. Yeast 
CEN sequences span 112 to 120 bp and are divided into 
three domains, designated centromeric DNA elements 
(CDE) I, II, and III. Figure 11.17a shows four examples of 
yeast CEN sequences that illustrate the overall similarity 
but subtle variation in centromere sequences. The cen- 
tromeric consensus sequences revealed in Carbon and 
Clarke’s analysis are shown in Figure 11.17b. That of CDE 
I is an 8-bp sequence RTCACRTG, where R is either of the 
purines adenine or guanine. That of CDE III contains 26 
bp rich in A-T. Between these elements is CDE II, varying 
in length from 78 to 86 bp and having more than 90% of 
its sequence composed of A-T base pairs. A single micro- 
tubule attaches to the kinetochore in yeast, but multiple 
microtubules attach to the kinetochores of other species 
(Figure 11.17c). 

The highly repetitive centromeric DNA sequences of 
eukaryotes are a region of constitutive heterochromatin. 
A specialized form of the histone H3 protein known as 
centromere protein A (CENP-A) binds centromeric DNA. 
CENP-A is similar to H3 from its C-terminal end through 
much of its length but has a very different N-terminal tail 
that is much longer than the one found in other H3 mol- 
ecules. The extended CENP-A N-terminal tail is critical 
to the binding of kinetochore proteins. 
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Figure 11.17 Conserved nucleotide sequence 


(a) Centromere regions of four chromosomes 


at the yeast centromere. (a) Centromeric É CDE I (8 bp) 4 CDE II CDE III (26 bp) 
sequence variation. (b) The centromeric consensus CEN3 STCACATG SAbp| [SSGATTGTATTTGATTICCOAAAGTTARARA 


sequence. (c) Microtubule attachment to the 


centromere region. (Abbreviations: R = purine, 


CEN4 GTCACATG 78bp//93%ATTGITTATGATTACCGAAACATAAAAG 


Y = pyrimidine.) 


CEN6 ATCACGTG 84bp//94%ATAGTITI1GI1TI1TCCGAAGATGIAAAA 


CEN11 GTCACATG 84bp//94%ATTGTICATGATTICCGAACGTATAAAA 


(b) Consensus sequence 


RTCACRTG 


TGTTTITG-TTTCCGAA---AAAAA 


(c) Site of microtubule attachment 


Position Effect Variegation: Effect of 
Chromatin State on Transcription 


Cell and molecular biologists now know that chromatin 
state is a critical component of the opportunity to tran- 
scribe genes in eukaryotes. Most expressed genes are 
located in euchromatic regions of chromosomes where 
DNA is not as tightly affiliated with histones. In contrast, 
relatively few expressed genes are found in heterochro- 
matic regions where histones and other protein tightly 
bind DNA. Thus differences in chromatin state play an 
important role in regulating eukaryotic gene expression, 
as we discuss in Chapter 15. 

The constitutive heterochromatin in centromeric re- 
gions is present in all but the S phase of the cell cycle 
when DNA replicates. During S phase, histones and other 
proteins that otherwise bind to DNA release their grip 
to allow replication. We have seen that nucleosome core 
particles dissociate from DNA and partially disassemble 
ahead of the replication fork during S phase and that they 
are then reconstituted after the replication fork passes. 
Once the replication fork passes, new and original histone 
dimers and tetramers reassemble, and heterochromatic 
compaction is reestablished in the centromeric region. 
In the case of replication of centromeres, however, the 
borders for reestablishing boundaries on each arm of 
the chromosome are somewhat variable. The reason for 
the variability is that there are no expressed genes in the 
immediate vicinity of centromeres, and a little more or 
a little less spread of centromeric heterochromatin after 
the completion of replication normally has no impact on 
gene expression. 

Study of the reacquisition of centromeric heterochro- 
matin following replication provided the circumstances 


TGTTTTTG-TTTCCGAA---AAAAA 
ACAAAAAC-AAAGGCTT---TTTTIT 


Single microtubule 


for the first observation of the role of chromatin state in 
controlling gene expression. The first experimental evi- 
dence connecting chromatin structure to gene expression 
came from the observation of position effect variega- 
tion (PEV), a mutation affecting eye color in Drosophila. 
During the 1920s and 1930s, in tests of the effect of X-rays 
on Drosophila development, Hermann Muller identified 
X-ray—exposed fruit flies with a variegated pattern of eye 
color. Whereas the wild-type Drosophila eye is red, flies 
with variegated eye color had red and white patches of eye 
tissue. Furthermore, the variegation differed from one fly 
to the next and was even different between the eyes of a 
single fly. Muller presumed that the red patches resulted 
from expression of the wild-type w” allele for red color. 
White patches of fly eyes have no color, as the result of 
absence of w* expression. 

In Muller’s most important variegation experiments, 
he began with flies that were pure-breeding for red eye, 
that is, males were w*/Y and females were ww”. Recall 
that the w* gene is located near the telomere of the X 
chromosome (see Figure 5.6). After exposing these flies to 
X-rays and producing progeny with variegated eye color, 
he noticed that the X chromosomes of flies with variegated 
eye color had an abnormal structure. These X chromo- 
somes had been broken by the damaging effects of X-rays 
very near the centromere, and the acentric chromosome 
pieces had then rejoined the remainder of the X chromo- 
some, except that now they were inverted 180 degrees rel- 
ative to their normal position. He realized that, as a result 
of this X chromosome inversion, w* had moved from its 
normal location near the telomere of the X chromosome 
to a new position near the centromere of the chromosome. 

At the time, Muller speculated that the new posi- 
tion of w* near the centromere altered its expression. By a 


GENETIC ANALYSIS 


PROBLEM Suppose Dr. O. Sophila receives three new FISH probes from a colleague with the request that 
Dr. Sophila’s laboratory determine the likely hybridization targets of the probes on human chromosomes. 
Each FISH probe contains a single specific sequence. Chromosome spreads are prepared, and FISH probes 
labeled with distinct fluorophores are added. The following results are obtained: Probe A is 
several dozen nucleotides in length, and it labels each chromosome centromere but no other 
parts of any chromosome; probe B is about a dozen nucleotides in length, and it labels the 
telomeres on every chromosome but no other parts of any chromosome; probe C is about a 
dozen nucleotides in length, and it labels a single spot on each copy of chromosome 4 at band 
position 4q3.2. Dr. Sophila asks you to interpret these experimental results and to help his colleague by 
identifying the likely sequence-binding target of each probe. 


Solution Strategies Solution Steps 


the discussion of FISH on 


BREAK IT DOWN: Review 
pp. 377-378. 


Evaluate 

1. Identify the topic of this problem and 1. This problem concerns the interpretation of hybridization results of FISH 
the nature of the requested answer. (fluorescent in situ hybridization) in human chromosomes. 

2. Identify the critical information given 2. The answer must identify the likely target sequences detected by each of 
in the problem. the three FISH probes based on the described hybridization patterns. 

Deduce 

3. Review the meaning and interpreta- 3. Centromeres contain a specialized DNA sequence that is bound by ki- 


tion of probe hybridization to DNA 


TIP: FISH probes hybridize by complementary base pairing. Probes longer 
than about 20 base pairs may hybridize even if there are a few mismatches. 


netochore proteins rather than histone proteins. Telomeres are composed 
of hundreds of copies of short, repetitive DNA sequences generated by 
telomerase. 


Heterochromatic DNA contains few expressed genes, and heterochro- 
matic DNA sequences are more likely to be repetitive. 


= Recall the makeup of eukaryotic 4. 
chromosomes in terms of their 
content of protein-coding genes and 
other types of DNA sequences. 


Solve 


5. Provide an interpretation of the DNA 5. 
sequence targeted by probe A. 


By hybridizing exclusively to centromeric regions, probe A is likely to 

be targeting the specialized DNA sequences that attract kinetochore 
proteins. These sequences are somewhat variable from centromere to 
centromere, but they are similar, and probe A is long enough to hybridize 
to multiple similar but not identical target sequences. 

Hybridization exclusively to telomeres indicates that probe B is targeting 
the short repetitive DNA sequences of telomeres. 

Probe C hybridizes to a single location on homologous copies of chromo- 
some 4 that is most likely to be a protein-coding gene. The band 4q3.2 is a 
euchromatic region of the chromosome, where many expressed genes are 
located. The identity of the gene cannot be determined, however, without 
additional information. 


6. Provide an interpretation of the DNA 6. 
sequence targeted by probe B. 

7. Provide an interpretation of the DNA as 
sequence targeted by probe C. 


For more practice, see Problems 13 and 25. Visit the Study Area to access study tools. 
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the expression of w* varies from cell to cell. The molecu- 
lar basis for PEV was discovered several decades later, and 


mechanism he could not explain, the new position of w* 
near the centromere led to the allele being expressed in some 


cells but not in others. Those cells in which the allele was 
expressed had pigment deposition and were red, and those 
in which expression did not occur were white. The pattern 
of positioning of w*-expressing and w*-nonexpressing cells 
differed from fly to fly and between the eyes of a single fly; 
hence the variegation patterns differed. 

Follow-up research has determined that Muller’s 
general explanation for PEV was correct—with inversion, 


it is the result of the extent of centromeric heterochroma- 
tin spread following replication in inverted chromosomes. 
Figure 11.18 illustrates this occurrence. If centromeric 
heterochromatin distribution after replication does not 
reach the new location of w*, the gene will be in a euchro- 
matic region and can be actively transcribed. This allows 
pigment deposition and can constitute a patch of red eye 
color. On the other hand, in cells in which centromeric 
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Figure 11.18 Position ef- 
fect variegation of eye color 

in Drosophila. The w” allele is 
expressed in wild-type X chromo- 


somes and in inverted X chromo- | 


= Soe 


somes when the latter contain 
centromeric heterochromatin 
that does not spread to cover 
the gene. If the spread of centro- 
meric heterochromatin covers 
the new gene location in inverted 
X chromosomes, w’ is silenced. 
The CH; (methyl) groups indicate 
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heterochromatin spreads across the new location of w*, 
the allele is in a heterochromatic region and is not ex- 
pressed. Pigment is lacking in these cells, which therefore 
can constitute a white patch of eye color. The formation 
of heterochromatin is usually associated with the meth- 
ylation (addition of CH3 groups) to amino acids of histone 
proteins. The CH3 groups in Figure 11.18 indicate the 
presence of heterochromatin. We discuss this phenom- 
enon in Chapter 15. 

The key to variegation in this case is the extent of the 
spread of centromeric heterochromatin in X chromo- 
somes having the inversion that places w* near the cen- 
tromere. If centromeric heterochromatin spreads across 
the new location of w*, the allele is transcriptionally 
silenced because transcriptional proteins are unable to 
access regulatory DNA sequences that are in a tightly 
bound chromatin state. If, on the other hand, centro- 
meric heterochromatin does not spread as far as the new 
location of w*, the allele is in a euchromatic region where 
DNA is in a less tightly compacted chromatin state, and 
transcription can take place. 

Since Muller first described position effect var- 
iegation and since its molecular basis was identified, 
geneticists and cell biologists have come to understand 
that chromatin structure is a critical component of 
gene expression in eukaryotic genomes. Research on 
PEV establishing the direct role of chromatin state on 
w* expression, and extensive follow-up research estab- 
lishing the central role of chromatin state in eukaryotic 
gene expression, has led to two central conclusions: 
(1) Gene expression can be controlled by the state of 
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the chromatin in which a gene is located, and (2) gene 
expression or gene silencing can be dictated by chro- 
matin structure that is transmissible from one cell gen- 
eration to the next. We discuss these and other topics 
related to the regulation of eukaryotic gene expression 
in Chapter 15. 


11.5 Chromatin Organizes Archaeal 
Chromosomes 


In chapters discussing DNA replication, transcription, 
and translation, we have compared and contrasted im- 
portant functional proteins and activities in archaeal cells 
with similar proteins and activities in bacterial and eu- 
karyotic cells. In this section, we turn our attention to the 
structure of the chromosome in archaea—specifically, to 
the issue of protein-based organization of the chromo- 
some by histone proteins and to the evolutionary im- 
plications of the presence of archaeal histone proteins. 
Through this discussion, we will see the shared ancestry 
of archaea and eukaryotes. 


Archaeal Chromosome and Genome 
Characteristics 


The genetics of bacteria and eukaryotes have been studied 
over many decades, in species too numerous to accurately 
count. In contrast, the domain Archaea is relatively newly 
discovered, having been first identified through the work 


of Carl Woese on ribosomal RNA genes in the mid-1970s 
(see Chapter 1), a proposal that only achieved wide accep- 
tance in biology in the 1980s. 

Despite the relatively recent start to investigations of 
archaeal species, some general chromosome and genome 
characteristics are clear. For example, archaeal cells, like 
bacterial cells, have no nucleus. Archaea are haploids 
and, like bacteria, have a genome usually consisting of a 
single chromosome that is usually circular. The total size 
of archaeal chromosomes varies over more than a tenfold 
range. The smallest archaeal chromosome sequenced to 
date is that of Nanoarchaeum equitani, with 490,885 bp, 
and the largest chromosome is in Methanosarcina acetiv- 
orans, with 5,791,492 bp. Like bacterial genomes, a high 
percentage of the archaeal genome encodes proteins. On 
average, of any archaeal total genome sequence, 87% con- 
sists of protein-coding sequences. This value is equivalent 
to the bacterial genome average and is far greater than 
the percentages of protein coding sequences found in 
eukaryotic genomes. Also as in bacterial genomes, some 
repetitive DNA sequences, as well as intergenic regions 
between genes, are found. In addition, many archaeal 
genes share promoters and other transcription-regulating 
DNA sequences, as do many bacterial genes. (We discuss 
the coordinated transcription of multiple bacterial genes 
in Chapter 14.) Lastly, like bacterial cells, archaeal cells 
often contain plasmids as extrachromosomal DNA, and 
there are numerous examples of gene transfer between 
archaeal cells by conjugation. These circumstances are 
described for bacteria in Chapter 6. 


Archaeal Histones 


In sharp contrast to the above list of general similarities 
between archaeal genomes and chromosomes and those 
of bacteria, many, perhaps most, archaea have histone 
proteins that are homologous to the histone proteins 
forming nucleosome core particles in eukaryotes. As 
of early 2014, histone protein amino acid sequence 
data were limited to about 90 species, but these data 
indicate that archaeal histone proteins form a family of 
proteins with strong homology to eukaryotic histones. 
On average, archaeal histones contain 65 to 75 amino 
acids. Three-dimensional protein structure studies have 
determined that these histone proteins self-assemble 
into multimeric complexes with other histone proteins 
and that the resulting structures resemble those seen in 
eukaryotes. 

For the most thoroughly studied type of archaeal 
histone protein, strong homology is identified with eu- 
karyotic histones H3 and H4. This homology results in 
identical amino acid sequences in protein segments critical 
to folding. As in eukaryotes, archaeal histone complexes 
affiliate with DNA that wraps the complex. A span of ap- 
proximately 90 bp of archaeal DNA is required to wrap the 
histone protein complex (Figure 11.19). 
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Figure 11.19 Archaeal DNA wrapping of histone proteins. 
A span of approximately 90 bp of DNA wraps a histone protein 
complex in archaea. The archaeal histones shown are homologs 
of eukaryotic H3 and H4. 


What is the functional role of histones in archaeal 
cells? The answer is not currently known, but the ques- 
tion is under active research investigation. At the mo- 
ment, the available evidence indicates a role for archaeal 
histone proteins in DNA compaction, but there is as 
yet little evidence that archaeal histones play a role in 
regulating gene transcription. This makes sense in terms 
of the single-celled, haploid character of archaea. Like 
bacteria, archaea must be capable of accessing and 
transcribing any gene at any time. The situation is very 
different for multicellular eukaryotes, in which each 
type of specialized cell is incapable of expressing most 
genes and instead expresses only its own specific limited 
number of genes. On one hand, homology of eukaryotic 
and archaeal histone proteins suggests that they have 
similar composition and might share some functional 
similarity. On the other hand, separate evolution of ar- 
chaeal and eukaryotic histones may have led to different 
functional capabilities. 


Phylogenetic Origins of Histone Proteins 


Histone proteins are not found in bacteria, they are 
present in all eukaryotes, and they are found in most 
archaea. This suggests that histone proteins were not 
present in the LUCA (last universal common ances- 
tor) and arose after the bacterial split off but before 
diversification of archaea and eukaryotes. With the di- 
vergence of archaeal and eukaryotic lineages, separate 
evolution has shaped the composition and function of 
histone proteins in each. 

The implications of the evidence from the study of 
histone proteins in archaea and eukaryotes are in keep- 
ing with the evolutionary discussions of earlier chap- 
ters. Three distinct domains have evolved from their 
last universal common ancestor. The bacterial lineage 
was the first to split from the common ancestral root of 
eukaryotes and archaea. The result of the subsequent ar- 
chaea—eukarya split is that both domains are seen to share 
features with bacteria while having more in common with 
one another. 
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CASE STUDY 


Fishing for Chromosome Abnormalities in Cancer Cells 


The genomes of cancer cells are highly abnormal and typically 
contain numerous gene mutations that disrupt many funda- 
mental cell activities, such as cell cycle control, cell-to-cell in- 
teractions and communication, rate of cell division, and DNA 
damage repair. In addition, the chromosomes of cancer cells 
commonly display multiple abnormalities, including deletions 
or duplications of all or parts of chromosomes, and various 
structural abnormalities, such as translocations in which part 
of one chromosome is transferred and attached to a nonho- 
mologous chromosome. 

At one time, G banding was used as a way of identify- 
ing chromosome abnormalities in cancer cells. This process 
has been largely replaced by the development of multicolor 
FISH techniques and the use of distinct probes and fluoro- 
phores for each chromosome. The new methodology permits 
more accurate detection and identification of chromosome 
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abnormalities. Figure 11.20a shows the chromosomes of 
a cancer cell in which FISH has revealed multiple chromo- 
somal abnormalities. Notice that several chromosomes con- 
tain more than one color. Normal chromosomes would have a 
single, solid color. The presence of multiple colors on a chro- 
mosome indicates that the chromosome is actually composed 
of pieces from two or more nonhomologous chromosomes. 
This occurrence reflects the general instability and high muta- 
tion rate of the genomes of cancer cells. 

While these features are common in cancer cells, they 
are usually a consequence, not a cause, of cancer. On the 
other hand, a few rare cancers appear to be caused by spe- 
cific chromosome rearrangements that occur so frequently 
in the cancer that they are effectively diagnostic for that par- 
ticular type of cancer. Figures 11.20b and 11.20c show two 
examples. One, in Figure 11.20b, shows a specific reciprocal 
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Figure 11.20 FISH detection of chromosome rearrangements in human cancer cells. (a) General 
chromosome instability leads to the frequent observation of multiple chromosome abnormalities in 
cancer cells, abnormalities that are readily observed using FISH methods (right). (b) A reciprocal translo- 
cation between chromosome 9 and chromosome 22 is very common in chronic myelogenous leukemia 
(CML). (c) Translocation between chromosome 8 and chromosome 14 is frequently detected in Burkitt's 
lymphoma cells. 


translocation between one copy of chromosome 9 and one 
copy of chromosome 22 that is seen in most cases of chronic 
myelogenous leukemia (CML). One copy of chromosome 9 
and one copy of chromosome 22 undergo chromosome 
breaks at the locations indicated by the arrows and exchange 
pieces in a reciprocal translocation mutation. The other cop- 
ies of chromosome 9 and of chromosome 22 are intact. 
The result of the translocation is a dramatic overexpression 
of a growth-stimulating protein that triggers the leukemia. 
Overexpression occurs because the growth protein gene has 
been moved from its normal location to a new location where 
a very active promoter overdrives its transcription. This is a 
classic dominant gain-of-function mutation (see Chapter 4). 
Translocation also results in a characteristically small chromo- 
some 22 called the Philadelphia chromosome. Since it was 
first identified in the 1960s, the Philadelphia chromosome 
has been a hallmark of CML. 
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11.1 Viruses Are Infectious Particles Containing 
Nucleic Acid Genomes 


Viruses are noncellular infectious particles that contain 
single- or double-stranded DNA or RNA as their genetic 
material. 

Viral genomes do not contain the genes required to support 
replication of the genetic material or transcription and trans- 
lation of viral genes. Viruses are therefore obligate parasites 
of host cells. 

Viral genomes are contained in protein capsids that in some 


viral species are enveloped by host cell cytoplasmic mem- 
branes and in some species are unenveloped. 


11.2 Bacterial Chromosomes Are Organized by 
Proteins 


Bacterial genomes are haploid and usually contain a single, 
circular chromosome. The genomes of certain bacterial spe- 
cies contain more than one chromosome. 

Bacterial chromosomes are 1000 or more times longer than 
the cells they reside in and are localized to the nucleoid region. 
Proteins associate with bacterial chromosomes to aid 
compaction. 

Supercoiling of circular bacterial chromosomes is the prin- 
cipal mechanism for compaction of the chromosome into 
bacterial cells. 


11.3 Eukaryotic Chromosomes Are Organized 
into Chromatin 


Eukaryotic nuclei contain multiple chromosomes that are 
highly compacted. 

Eukaryotic chromosomes are composed of chromatin—a 
mixture of DNA, histone proteins, and other nonhistone 
proteins. 
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The second cancer resulting directly from chromosome 
rearrangement is Burkitt's lymphoma, shown in Figure 11.20c. 
In Burkitt’s lymphoma, a reciprocal translocation between 
chromosomes 8 and 14 is very frequently observed. As in 
CML, chromosome translocation puts a growth-stimulating 
gene in a new location where it is overexpressed. 

G banding pattern differences between normal chromo- 
somes and the translocation chromosomes of Burkitt’s lym- 
phoma and CML were the original methods used to identify 
these characteristic chromosome rearrangements. In recent 
years, the use of FISH, with its chromosome-specific fluo- 
rophores, has made the task of identifying these and other 
specific chromosome rearrangements in cancer considerably 
simpler. FISH has become an important diagnostic tool in the 
identification of other chromosome abnormalities as well. 
We describe some of these abnormalities in more detail in 
Chapter 13. 


For activities, animations, and review quizzes, go to the Study Area. 


Eight histone protein molecules form nucleosomes around 
which 146 bp of DNA wraps to form the 10-nm fiber. 


The 10-nm fiber condenses to form the 30-nm fiber. 


Nonhistone proteins form the chromosome scaffold that 
gives structure to chromatids and aids in additional chromo- 
some compaction during prophase of the cell cycle. 


Chromatin loops form with the aid of proteins that help 
form the chromosome scaffold. In each different type of cell, 
expressed genes are more distant from anchor points on the 
scaffold than unexpressed genes. 


11.4 Chromatin Compaction Varies along the 
Chromosome 


Chromosomes are categorized by structure on the basis of 
the centromere position and the ratio of long arm (q arm) 
length to short arm (p arm) length. 
Specialized molecular probes are used for in situ hybridiza- 
tion to locate specific genes or chromosome-specific DNA 
sequences. These probes often utilize fluorescent labels for 
detection. 
During interphase, each chromosome inhabits a territory of 
its own in the nucleus. Chromosome positioning within the 
territory is tied to replication and transcription. 

E Each chromosome has a distinctive banding pattern created 
by applying stains or dyes to condensed chromosome spreads. 

E Heterochromatic DNA forms darkly staining bands that 
contain relatively few expressed genes. 

i Euchromatic DNA forms lightly staining bands that contain 
the majority of expressed genes. 

I The centromere consists of specialized DNA sequences that 
bind kinetochore proteins. 
Studies of position effect variegation (PEV) have determined 
that the structure of chromatin surrounding a gene directly 
influences transcription. 
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E Phylogenetic analysis comparing histone proteins has deter- 
mined that histones developed after the branching off of the 
bacterial lineage and before the divergence of the eukarya 
and archaea lineages. 


11.5 Chromatin Organizes Archaeal 
Chromosomes 


E Archaea are haploids with a single chromosome that is asso- 
ciated with histone proteins in most species. 


= 


Archaeal histones are homologous to eukaryotic histones 
and function to compact the chromosome by wrapping of 
DNA around histone protein complexes. 
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Chapter Concepts For answers to selected even-numbered problems, see Appendix: Answers. 
1. Describe the structure and composition of a bacterial chro- c. CEN sequences 
mosome. Describe the same features of a bacterial plasmid. d. G bands 
How are these structures similar, and how do they differ? e. euchromatin 
f. heterochromatin 
2. Biologists typically define bacterial and archaeal genomes 
“ q» : ; g. nucleosome 
as “haploid,” but some bacterial genomes contain more , 
h. chromosome territory 
than one chromosome in the genome, and some archaeal ; : 
i. nucleoid 
cells have more than one copy of the chromosome. Does 
the term “haploid” conflict with the occurrence of more Describe the importance of light and dark G bands that ap- 
than one chromosome in bacterial genomes or of multiple pear along chromosomes. 
o - the chromosome in archaeal genomes? Why or In eukaryotic DNA, 
why not? 
y a. Where are you most likely to find histone protein H4? 
3. Bacterial DNA is compacted by two principal mechanisms. b. Where are you most likely to find histone protein H1? 
Identify and briefly describe each mechanism. c. Along a 6000-bp segment of DNA, approximately how 
many molecules of each kind of histone protein do you 
4. The human genome contains 2.9 X 10° base pairs. y find? Explai P y 
Approximately how many nucleosomes are required to Expect eee ee 
ai ’ d. How does the role of H1 differ from the role of H3 in 
organize the 10-nm-fiber structure of the human genome? : ‘ 
; ; chromatin formation? 
Show the calculation you use to determine the answer. 
f a : Describe the relative differences you expect between the 
5. Give descriptions for the following terms: levels of chromosome condensation in interphase and in 


a. histone proteins 
b. nucleosome core particle 


metaphase. 


10. 


11. 


12. 


Human late prophase karyotypes have about 2000 visible G 
bands. The human genome contains approximately 22,000 
genes. Consider the region 5p1.5 through the end of the 
short arm of chromosome 5 that is identified on the late 
prophase chromosome in Figure 11.16, and assume the en- 
tire region is deleted. Approximately how many genes will 
be lost as a result of the deletion? 


What are the two or three most essential components of 
a bacterial chromosome sequence? Of a eukaryotic chro- 
mosome sequence? Thinking in an evolutionary context, 
devise an argument to explain why these components are 
present. 


Explain why viruses are described as “particles” and not as 
“cells” and why they are characterized as “obligate para- 
sites” of host cells. 


Do bacterial chromosomes have centromeres? Do 
they have telomeres? Devise an argument for each an- 
swer to explain why or why not from an evolutionary 
perspective. 


Application and Integration 


18. 


19. 


As a follow-up to Genetic Analysis 11.1, in which you 
determined the approximate number of nucleosomes per 
nucleus in Arabidopsis thaliana, answer these questions: 

a. Ifthe number of nucleosomes given in answer to part 
(a) of Genetic Analysis 11.1 is for the nucleus of a cell in 
G; of the cell cycle, how many nucleosomes do you ex- 
pect in the nucleus after completion of S phase? Explain 
your answer. 

b. Are all of the additional nucleosomes that are present 
after completion of S phase of the cell cycle composed 
of newly synthesized histone proteins? Explain your 
answer. 


A survey of organisms living deep in the ocean reveals 
two new species whose DNA is isolated for analysis. 
DNA samples from both species are treated to remove 
nonhistone proteins. Each DNA sample is then treated 
with DNasel that cuts DNA not protected by proteins 
but is unable to cut DNA bound by histone proteins. 
Following DNasel treatment, DNA samples are subjected 
to gel electrophoresis, and the gels are stained with 
ethidium bromide to stain all DNA bands in the gel. 
The ethidium bromide staining patterns of DNA from 
each species are shown in the figure. The number of 
base pairs in small DNA fragments is shown at the left 
of the gel. Interpret the gel results in terms of chroma- 
tin organization and the spacing of nucleosomes in the 
chromatin of each species. 
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14. 


15. 


16. 


17. 


20. 


(d) 


21. 
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A researcher interested in studying a human gene on 
chromosome 21 and another gene on the X chromosome 
uses FISH probes to locate each gene. The chromosome 21 
probe produces green fluorescent color, and the X chro- 
mosome probe produces red fluorescent color. 


a. Ifthe subject studied is female, how many green and 
red spots will be detected? Explain your answer. 

b. Ifthe subject studied is male, how many green and red 
spots will be detected? Explain your answer. 


Describe how DNA sequence will change with distance 
from the telomere. 


In what way does position effect variegation (PEV) of 
Drosophila eye color indicate that chromatin state can af- 
fect gene transcription? 


What are chromosome territories, and what significance 
do these regions have for gene expression? 


Identify two important differences that distinguish heterochro- 
matic regions of chromosomes from euchromatic regions. 


For answers to selected even-numbered problems, see Appendix: Answers. 


A eukaryote with a diploid number of 2” = 6 carries the 
chromosomes shown below and labeled A to F 


(a) 


(c) 
(b) 


(e) (f) 


a. Carefully examine and redraw these chromosomes in 
any valid metaphase I alignment. Draw and label the 
metaphase plate, and label each chromosome by its as- 
signed letter. 

b. Explain how you determined the correct alignment of 
homologous chromosomes on opposite sides of the 
metaphase plate. 


The chromosome diagram shown below represents a 
eukaryotic chromosome stained by G (Giemsa) banding. 
Indicate the heterochromatic and euchromatic regions of 
the chromosome, and label the chromosome’s centromeric 
and telomeric regions. 


Centromere 


a. What term best describes the shape of this 
chromosome? 

b. Do you expect the centromeric region to contain facul- 
tative heterochromatin? Why or why not? 
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c. Describe the features of general sequence composition 
and protein binding that differentiate the centromeric 
region from other regions of the chromosome. 

d. Why are expressed genes not found in the telomeric 
region of chromosomes? 

e. Are you more likely to find the DNA sequence encod- 
ing the digestive enzyme amylase in a heterochromatic, 
euchromatic, centromeric, or telomeric region? Explain 
your reasoning. 


22. Suppose the genome of a bacterium contains a circular 
chromosome composed of 1.6 x 10° bp. A geometric cal- 
culation tells us that the diameter of the circular chromo- 
some is about 10 times the diameter of the cell. 


a. How is this chromosome packaged inside the cell? 

b. Describe how this chromosome is packaged in the bac- 
terial nucleoid. 

c. Why is this chromosome supercoiled? 


23. DNasel cuts DNA that is not directly associated with nu- 
cleosomes. Markus Noll’s treatment of human DNA with 
DNasel produced DNA fragments that are consistently 
about 200 bp in length. Why does this result indicate that 
nucleosomes are evenly spaced on human DNA? What 
result would be obtained if nucleosomes were randomly 
spaced along DNA? 


24. Histone protein H4 isolated from pea plants and cow thy- 
mus glands contains 102 amino acids in both cases. A total 
of 100 of the amino acids are identical between the two 
species. Give an evolutionary explanation for this strong 
amino acid sequence identity based on what you know 
about the functions of histones and nucleosomes. 


25. The molecular probes used in FISH can detect repetitive 
DNA sequences or unique sequences that are parts of 
genes. 


a. How are the binding locations of FISH probes on chro- 
mosomes identified? 

b. Distinguish the detection of a FISH probe from the de- 
tection of a molecular probe in a Southern blot. 


26. Experimental evidence demonstrates that the nucleo- 
somes present in a cell after the completion of S phase are 
composed of some “old” histone dimers and some newly 
synthesized histone dimers. Describe the general design 
for an experiment that uses a protein label such as °S to 
show that nucleosomes are often a mixture of old and new 
histone dimers following DNA replication. 


27. DNasel cuts DNA that is not protected by bound proteins 
but is unable to cut DNA that is complexed with pro- 
teins. Human DNA is isolated, stripped of its nonhistone 


28. 


29. 


30. 


proteins, and mixed with DNasel. Samples are removed 
after 30 minutes, 1 hour, and 4 hours and run separately in 
gel electrophoresis. The resulting gel is stained with ethid- 
ium bromide, and the results are shown in the figure. DNA 
fragment sizes in base pairs (bp) are estimated by the scale 
to the left of the gel. 
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a. Examine the gel results and speculate why longer 
DNasel treatment produces different results. 

b. Draw a conclusion about the organization of chromatin 
in the human genome from this gel. 


Genomic DNA from the nematode worm Caenorhabditis 
elegans is organized by nucleosomes in the manner typi- 
cal of eukaryotic genomes, with 145 bp encircling each 
nucleosome and approximately 55 bp in linker DNA. 
When C. elegans chromatin is carefully isolated, stripped of 
nonhistone proteins, and placed in an appropriate buffer, 
the chromatin decondenses to the 10-nm fiber structure. 
Suppose researchers mix a sample of 10-nm-fiber chro- 
matin with a large amount of the enzyme DNase I that 
randomly cleaves DNA in regions not protected by bound 
protein. Next, they remove the nucleosomes, separate the 
DNA fragments by gel electrophoresis, and stain the frag- 
ments by ethidium bromide. 

a. Approximately what range of DNA fragment sizes do 
you expect to see in the stained electrophoresis gel? 
How many bands will be visible on the gel? 

b. Explain the origin of DNA fragments seen in the gel. 

c. How do the expected results support the 10-nm-fiber 
model of chromatin? 


What function do histone proteins perform in archaeal 
chromosomes? How is this function accomplished? What 
function is performed by histones in eukaryotes that is ap- 
parently not performed by archaeal histones? 


Based on discussions of specific proteins and structures in 
bacteria, archaea, and eukaryotes in this and other chap- 
ters, briefly describe your view of the evolutionary relation- 
ship between the three domains of life. 


Gene Mutation, DNA 
Repair, and Homologous 
Recombination 


The baby kangaroo peeking out of its mother’s pouch has autosomal 
recessive albinism, a condition that occurs in about 1 in 20,000 births. 


M utation can be defined most simply as a heritable 
change in DNA sequence, a definition that covers 

an enormous range of changes. Mutation is indispensable 

in two ways. From an evolutionary perspective, mutations 
generate new hereditary variety. Variant alleles can cause or- 
ganisms to differ from one another, enabling the organisms 
to evolve through any of the four evolutionary processes we 
identified in Section 1.4. Mutation is also indispensable from 
the perspective of genetic analysis. Whether for studying 

the effects of variant alleles on organisms, the processes that 
damage or repair DNA, or some other aspect of gene proper- 
ties and function, mutation analysis is at the heart of genetics. 
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Within the cell, mutations can derive from spon- 
taneous changes or through the action of DNA- 
damaging agents. Some changes to DNA that lead to 
mutation are the result of spontaneous alterations of 
the structure of nucleotide bases. On rare occasions, 
errors made during DNA replication can lead to muta- 
tion. Also, damage done to DNA by chemical, physical, 
or biological agents can affect DNA nucleotide bases 
and lead to mutation. Through whatever mechanism 
they occur, however, mutations are random, occurring 
in different species at different average rates, and af- 
fecting some genes more often than others as a con- 
sequence of the gene’s composition. 

In this chapter, we focus on mutation at the level 
of the individual gene—that is, gene mutation. We de- 
scribe spontaneous changes to DNA nucleotide base 
structure and the occasional DNA replication errors 
that can generate gene mutations. We also examine 
the DNA-damaging actions of chemical and physical 
agents and the role this damage plays in producing 
gene mutation. We postpone discussion of the bio- 
logical agents of mutation to the following chapter, 
which describes mutations at the chromosome level. 
(Among the mutations described in connection with 
chromosomes in Chapter 13 are processes involving 
the transposition of mobile elements of DNA that can 
move from place to place in the genome.) 

We end the current chapter with a discussion of 
DNA damage repair mechanisms and the connection 
between mechanisms of DNA double-strand break 
repair and crossing over. In the process, we examine 
bacterial systems of crossing over and also the cross- 
ing over between homologous chromosomes in eu- 
karyotes that is observed during meiosis. 


12.1 Mutations Are Rare and Occur 
at Random 


Gene mutations are random and their occurrence is rare. 
The random nature of mutations was first experimen- 
tally demonstrated by Salvador Luria and Max Delbriick 
in 1943. This preceded by just a few months the iden- 
tification of DNA as the hereditary material by Avery, 
MacLeod, and McCarty (see Section 7.1), and it came a 


decade before the molecular structure of DNA would be 
described. In the 70 years since this observation, the un- 
derstanding of the causes, consequences, and occurrence 
of mutations has been a staple of genetic research. 

The decades of study of gene mutations have pro- 
duced several general conclusions. First, mutation rates 
are low in all genomes, meaning that genome stability is 
paramount and mutations contribute slowly to inherited 
diversity. Second, gene mutations are usually deleterious to 
the organism, meaning that they impair the function of the 
gene or gene product and potentially harm the fitness of the 
organism. Third, despite their typically deleterious nature, 
mutations are essential for the generation of inherited ge- 
netic diversity that fuels evolutionary change. Fourth, gene 
mutation rates differ considerably among organisms, and 
they are more common in larger genomes than in smaller 
genomes. Genomes appear to have different levels of toler- 
ance for mutations, and mutation repair efficiency may vary 
among organisms. Lastly, mutation rates among different 
genes of a single species show variation, suggesting that 
there are intrinsic DNA sequence variables that lead to dif- 
ferent mutation rates among the genes in a genome. 


Mutation Rates 


In bacteria and other haploid microorganisms, the mutation 
rate is measured as the number of times mutation alters 
a particular gene per replication cycle or per generation. 
Mutations in these organisms are most often studied by 
screening for auxotrophic nutritional deficiencies that im- 
pair the organisms’ ability to grow on a minimal medium. 
Mutation rate in sexually reproducing diploids is the 
number of mutational events in a given gene per gen- 
eration. Recessive mutations can be identified particularly 
through the use of genome sequencing analysis and other 
molecular methods that can detect variation at the DNA 
sequence level. Mutations detected at the morphological 
level or affecting enzymes in a metabolic or biochemi- 
cal pathway are more likely to be dominant mutations. 
Dominant mutations are easier to detect, since a single 
copy of a dominant mutant allele will manifest in the 
phenotype. In contrast, a recessive mutation affecting 
morphology or a biochemical pathway will not be detect- 
able if the organism is heterozygous because the recessive 
allele will have its effect masked by the dominant allele. 
Mutation rates differ among organisms, and they differ 
between genes carried by a single species. Table 12.1 lists 
average mutation rates for selected organisms. Mutation 
rates as low as 1 X 10 ° to rates as high as 1 X 10 4 
are reported. Several biological factors intrinsic to organ- 
isms, including genome size and the organism’s life cycle, 
influence the average mutation rate in an organism. 
Mutation rates are variable among genes in an 
organism’s genome, and gene structure or composition is 
frequently a component of these differences. Factors in- 
cluding the composition of certain genes or genome regions 


Table 12.1 


Mutation Rate Ranges for Selected 


Taxonomic Groups 
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Organism Range 

Bacteria (Escherichia coli) 1 34 Or? foi) Se or? 
Algae (Chlamydomonas reinhardii) 1 X 107 to1 x 1078 
Fungi (Neurospora crassa) E 1077 tol x 110%) 
Plant (Zea mays) 1 < 1@ “tol S10? 
Insect (Drosophila melanogaster) 1 X10 °to1 x 1076 
Mammal (Homo sapiens) 1 < 10 “tol < 102 


make them more likely than other genes to be affected by 
mutation. In the human genome, for example, the average 
mutation rate for the average gene is on the order of 1 to 
10 per million gametes, or about 1 X 10 °. But in specific 
genes, such as DYS, which produces the human X-linked 
recessive disorder Duchenne muscular dystrophy, and NF1, 
which produces autosomal dominant neurofibromatosis, 
substantially elevated mutation rates are observed. Genes 
like these are identified as being hotspots of mutation, 
individual genes or regions of genomes where mutations 
occur much more often than average. 

DYS and NF1 have mutation rates that are about 
1 X 10 +, which is one to two orders of magnitude greater 
than the average human gene. Their high mutation rate 
is due to their size. These genes are the two largest genes 
known in the human genome. DYS, spanning approxi- 
mately 2.5 million bp on the X chromosome, is the largest. 
NF1 is also very large, spanning well over 1 million bp. 

Similar gene-to-gene variation in mutation rates is 
observed in other mammals. A 1971 report by Gunther 
Schlager and Margaret Dicke examined long-term data on 
mutation rates of five mouse coat color genes. The data, 
collected over many generations of mouse production at 
a commercial facility, yielded mutation rates that ranged 
from 2 to 12 X 10 ° per gene per generation (Table 12.2). 


Determination of Mutation Rate from Genome 
Sequence Analysis 


In methods that detect mutations in multicelled eukary- 
otes by analyzing expressed genes, only a relatively small 
subset of the genome can be sampled. In contrast, whole 
genome sequencing (described in Chapter 18) allows as- 
sessment of mutation rates throughout the genome. In a 
2010 study of mutation rate and types of mutations in the 
plant Arabidopsis thaliana, Michael Lynch and his col- 
leagues reported genome sequence analysis of five plants 
derived by 30 generations of single-seed descent from a 
common ancestral plant. The researchers detected a total 
of 116 mutations, 99 base-pair substitution mutations, 
and 17 insertion or deletion mutations, so-called indel 
mutations. The overall mutation rate for the genome was 


Table 12.2 Mutation Rates in Five Mouse Coat 

Color Genes? 

Number of | Number of Mutation 

Gametes Mutations Rate 

Gene Tested Detected (a x10 5) 
A (agouti) 67,395 3 44.5 
B (brown) 919,699 3 33 
C (nonagouti) 150,391 5 332 
D (dilute) 839,447 10 11.9 
Ln (leaden) 243,444 4 16.4 
Totals and 2,220,376 25 11.2 (average) 
average 


a = Mutations are wild-type dominant to recessive mutant in germ cells (sperm 
and egg). Data adapted from G. Schlager and M. M. Dicke (1971). 


5.9 X 107°? per site per generation. Most of the base-pair 
substitution mutations were G-C to A-T changes, and 
most of the indel mutations were 1- to 3-bp changes in 
the number of repeats of AT dinucleotides. The research- 
ers speculated that the large number of G-C to A-T base- 
substitution mutations was due to mutation at hotspots 
or via DNA damage induced by ultraviolet light. We dis- 
cuss these mechanisms later in the chapter. 

Mutation rate data on the human genome have also re- 
cently been published. In 2011, a large research group led by 
Philip Awadalla examined the human genome for evidence 
of mutation rate variation within and among families. Their 
data are based on assessment of the genome sequences of 
two parent-child trios, each consisting of a child and both 
parents. After complete genome sequencing and compari- 
son of sequences, Awadalla and his colleagues calculated 
a mutation rate of 1.17 X 10 8 for one parent-child trio 
and 0.87 X 10 ® for the other parent-child trio. The re- 
searchers found that somatic-cell mutations occurred at 
a much higher rate than germ-cell (sperm and egg) muta- 
tions. Looking at germ-cell mutations, the researchers were 
able to determine the parent of origin of each mutation. 
They found that for one family, 92% of the mutations were 
paternal in origin, whereas in the other family only 36% of 
mutations were paternal in origin. These findings indicate 
that there may be substantial variation in mutation rates 
in families, and they point to the need for a much more de- 
tailed analysis of mutations to determine how factors such 
as age, genetic background, and environmental exposures 
affect mutation rate in humans. 


12.2 Gene Mutations Modify DNA 
Sequence 


Gene mutations most often characterized by a change in 
DNA sequence that occurs by substituting, adding, or de- 
leting one or more DNA base pairs. These kinds of localized 
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mutations occur at a specific or identifiable location in a 
gene and are called point mutations. In this section, we de- 
scribe an overview of gene mutation occurrence, and then 
describe several types of point mutations that have char- 
acteristic consequences depending on the type of sequence 
change and the location of sequence change in a gene. 


Base-Pair Substitution Mutations 


The replacement of one nucleotide base pair by another 
is a base-pair substitution mutation. Two types of base- 
pair substitutions occur: transition mutations, in which 
one purine replaces the other (i.e. A replaces G, or vice 
versa) or one pyrimidine replaces the other (i.e., C replaces 
T, or vice versa); and transversion mutations, in which a 
purine is replaced by a pyrimidine, or vice versa. 

When base-pair substitution mutations occur in the 
coding-sequence of a gene, they are further categorized at 
the molecular level by the manner in which they alter the 
informational content of the gene. Such base-pair muta- 
tions may be silent mutations, missense mutations, or non- 
sense mutations. Table 12.3 summarizes these mutations. 


Silent Mutation A base-pair substitution producing an 
mRNA codon specifying the same amino acid as the wild- 
type mRNA is known as a silent mutation. Figures 12.1a 
and 12.1b illustrate a silent mutation in which an A-T 
to G-C transition mutation changes the wild-type leucine 
codon (5’-UUA-3’) to a mutant codon (5'-UUG- 3’) that also 
encodes leucine. Silent mutations are possible because the 
genetic code is redundant, having 2 to 6 codons for most 
amino acids (see Table B inside the front cover). 


Missense Mutation A base-pair substitution that results 
in an amino acid change to the protein is a missense 


Table 12.3 Point Mutations 


Type Consequence 
Coding-Sequence Mutations 

Silent No amino acid sequence 
change 

Missense Changes one amino acid 

Nonsense Creates stop codon and 
terminates translation 

Frameshift Wrong sequence of amino 
acids 

Regulatory Mutations 
Promoter Changes timing or amount 


of transcription 

Alters sequence of mRNA 
Improperly retains an intron 
or excludes exon 

Increases (or less often, 
decreases) number of short 
repeats of DNA 


Polyadenylation 
Splice site 


DNA replication mutation, 
e.g. triplet-repeat 
expansion 


(a) Wild-type sequence 


DNA 5'//TTA TTT AGA TGG TGT |] 3’Coding strand 
3 I [AATAAA TOT ACC ACA] j} 5’ Template strand 


mRNA 5’ ff [UIUANUIUUNATGANUIGIGNUTGUN A 3’ 
Polypeptide N | [cLaus(Bheschraslamea aca |< 


(b) Silent mutation 


DNA 5’ //TTE|TTT AGA TGG TGT || 3’ Coding strand 
3 T [AAAA TCT ACC IACAY/) 5’ Template strand 
mRNA 5’ Juua uuo A 3° 
Polypeptide n / JTP] | 


(c) Missense mutation 


DNA 5’ //TTA TTT AGA AGG 1 GT // 3’Coding strand 
3')/ [NAT AAA TCT|Mc cl Aca // 5’ Template strand 


mRNA 5’ ff /WUIANUIUIUNATGIA) N A 3° 
Polypeptide n / Denhe | 


(d) Nonsense mutation 


DNA 5°]/TTA TTT AGA|TGEN|1 GT //7 3’Coding strand 
3’) [AAT AAA TCT ACACA i 5’ Template strand 


be 


mRNA 5'fif/ 


Polypeptide N | 


Figure 12.1 The consequences of base-pair substitutions. 


mutation. Figure 12.1c shows a T-A to A-T transversion 
mutation that alters the wild-type 5'-UGG-3' 
codon to 5'-AGG-3', changing the amino acid from 
tryptophan to arginine. Protein function may be altered 
by a missense mutation. The specific consequence of the 
protein change (i.e., whether it results in complete or 
only partial loss of protein function) depends on what 
kind of amino acid change takes place and where in the 
polypeptide chain the change occurs. The tall versus 
short stature of pea plants studied by Mendel is caused 
by a missense mutation. See Experimental Insight 12.1 for 
a discussion. 


Nonsense Mutation A base-pair substitution that 
creates a stop codon in place of a codon specifying an 
amino acid is a nonsense mutation. The GC-to-AT base- 
pair substitution shown in Figure 12.1d that changes the 
UGG (Trp) codon to a UGA (stop) codon is an example of a 
nonsense mutation. 


Experimental Insight 12.1 


Mendel’s Mutations 


Table 2.6 on page 55 and the accompanying text briefly de- 
scribe the wild-type and mutant alleles of the four genes of 
Mendel that have been identified to date. The three genes 
described in this Experimental Insight result from point muta- 
tions and are described here. The fourth gene of Mendel is 
described in Section 13.7. 


STEM LENGTH: A MISSENSE MUTATION 


The Le gene variation was identified in 1997 by research 
groups led by Diane Lester and David Martin, who deter- 
mined that the wild-type dominant allele of this gene (Le) 
produces an enzyme active in the biosynthetic pathway that 
produces the growth hormone giberillin-3-B-hydroxylase. The 
effect of the dominant allele is to generate the wild-type level 
of growth hormone production, which, in turn, produces the 
long stems that characterize tall pea plants. The recessive 
mutant allele (/e) is unable to produce the enzyme, and this 
reduces the biosynthesis of the growth hormone to about 
5% of the wild-type level. The result is poor stem growth and 
short plants. 

The le allele is the result of a missense mutation that 
changes an alanine to a threonine in the polypeptide product 
of the gene. This missense change is brought about by a G-C 
to A-T transition mutation in the le allele’s DNA sequence. 
It is an example of a missense mutation that inactivates the 
function of the allele’s protein product. In this case, the con- 
sequence of the mutation is the significant reduction of the 
synthesis of a growth hormone. 


POD COLOR: AN INSERTION MUTATION 


The 2007 studies of the Sgr (“stay green”) gene by research 
groups led by lan Armstead and Sylvain Aubry identified the 
molecular basis for the dominant wild-type yellow seed pod 
and the recessive mutant green seed pod. The wild-type al- 
lele produces an enzyme that participates in the breakdown 
of chlorophyll contained in the seed pod. This breakdown 
normally occurs in conjunction with pod maturation, and it 
results in mature seed pods that are yellow. The mutant allele 


Frameshift Mutations 


Insertion or deletion of one or more base pairs in the 
coding sequence of a gene leads to addition or deletion 
of mRNA nucleotides. This can alter the reading frame 
of the codon sequence, beginning at the point of muta- 
tion. The result would be a frameshift mutation, in 
which the mutant polypeptide contains an altered amino 
acid sequence from the point of mutation to the end of 
the polypeptide (Figure 12.2). In addition to producing 
the wrong amino acids in a portion of the polypeptide, 
frameshift mutations also commonly generate premature 
stop codons that result in a truncated polypeptide. For 
these reasons, frameshift mutations usually result in the 
complete loss of protein function and thus produce null 
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produces a very poorly functioning enzyme, largely disabling 
a critical step of chlorophyll breakdown. Consequently, chlo- 
rophyll is retained in mature pods, making them green. 

The mutant allele contains a 6-bp insertion that changes 
the enzyme product by adding two additional codons to 
mRNA and two amino acids to the protein. This insertion 
of 6 bp, being a multiple of three nucleotides as found in 
a codon, does not change the reading frame. Thus, in the 
mutant protein, the amino acid sequence is normal except 
for the presence of the two additional amino acids. Since 
the mutant protein is largely normal, it is able to retain 
partial function, albeit significantly reduced in comparison 
to wild-type. 


FLOWER COLOR: AN mRNA-SPLICING MUTATION 


Purple flower color is dominant in pea plants, and it results 
from the production of the pigment anthocyanin. The reces- 
sive mutant phenotype is white flower color, and in these 
plants there is no anthocyanin production. A research group 
led by Roger Hellens identified the bHLH gene as the source of 
the white flower mutation in pea plants. This gene produces a 
transcription factor protein that helps activate the transcrip- 
tion of several genes, including some in the anthocyanin- 
production pathway. In the absence of a functioning protein 
product from the bHLH gene, anthocyanin production does 
not take place. 

The mutation in the recessive allele is a G-C to A-T base- 
pair substitution that alters the guanine at the 5’ splice site of 
one of the introns of the allele. Recall that 5’ splice sites have 
an invariant GU dinucleotide in mRNA. The base substitu- 
tion identified by Hellens changes the 5’ sequence to an AU 
dinucleotide that is not recognized as a splice site. An alterna- 
tive splice site (known as a cryptic splice site; see the text for 
discussion) is used instead to process the mutant mRNA tran- 
script. The aberrant splicing elongates the mature mRNA by 
eight nucleotides. This addition of mRNA nucleotides results 
in a frameshift during translation, and the protein product is 
nonfunctional. 


alleles. The yellow versus green seed pod trait studied by 
Mendel is caused by an insertion of six base pairs of DNA. 
Since the insertion is a multiple of three nucleotides, it 
adds two codons to the mutant allele mRNA. Thus, this 
particular mutant is not the result of a frameshift muta- 
tion, but the insertion of DNA base pairs is a common 
mechanism producing such mutations. See Experimental 
Insight 12.1 for a discussion. 


Regulatory Mutations 


Some point mutations have the effect of reducing or in- 
creasing the amount of wild-type gene transcript and the 
amount of wild-type polypeptide without affecting the 


396 


(a) Wild-type sequence 


DNA 5'//TTA TTT AGA TGG TGT]/ 3' Coding strand 
3'/ [AAT ANA TOT ACC ACA | 5'Template strand 


mRNA 5’ ff /SUIUIANUIUIUNATGIANUIGIGNUIGIUNY [| 3’ 
Polypeptide N / Jenco] | 


(b) Frameshift mutation: Insertion of single base pair 


7 

aN 

DNA 5/7 ]TTHIATT TAGATG GTG1{/ 3’Coding strand 
3'// A ABYITAA ATC TAC CAC Ay |. 5'Template strand 


AE PU AUU UAG AUG GUG UIP 


Shifted 
nucleotide 
sequences 


Polypeptide N | À Phe Ile STOPLO 
il 


| Altered amino acid sequence | 


(c) Frameshift mutation: Deletion of single base pair 
T] 
A 
DNA 5']/TTA TSE) / 3’ Coding strand 
3'1 [AAT AA / 5'| Template strand 
mRNA 5'Ẹ/ AUUE A 3 hited 


nucleotide 
Polypeptide N | qun Gly Val fle 


sequences 
| Altered amino acid sequence | 


Figure 12.2 Frameshift mutation. 


transcript and polypeptide sequences. These mutations, 
classified as regulatory mutations, occur in noncod- 
ing regions of genes, such as promoters, introns, and 
regions coding 5'-UTR and 3’-UTR segments of mRNA. 
None of these regions directly encodes amino acids, but 
mutations in these regions can lead to the production 
of abnormal mRNAs that, in turn, produce mutant pro- 
teins. Three types of regulatory mutations are commonly 
recognized: promoter mutations, splicing mutations, and 
cryptic splice sites. 


Promoter Mutations Promoter consensus sequences 
recognized by RNA polymerase II and its associated 
transcription factors direct the efficient initiation of 
transcription. Mutations that alter consensus sequence 
nucleotides and interfere with efficient transcription 
initiation are promoter mutations. The human ß-globin 
gene offers multiple examples of promoter mutations, with 
various consequences for transcription. Figure 12.3a lists 
mutations at six positions of the human f-globin gene 
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promoter that each result in a moderate reduction in the 
amount of B-globin gene transcript and in a reduced amount 
of B-globin protein. Each of the six promoter mutations 
shown here reduces transcription, but none eliminates 
transcription entirely. Some promoter mutations of other 
genes result in the complete elimination of transcription. 


Splicing Mutations The DNA dinucleotide GT, on the 
coding strand, occurs invariably at the 5’ splice site of the 
intron to demarcate the boundary between the 5’ intron 
end and the 3’ end of an exon (the GT of coding strand 
DNA corresponds to the GU dinucleotide of mRNA; see 
Figure 8.21). In the human f-globin gene, an AG dinucleotide 
occurs at the 3’ end of exon 1. Each of these dinucleotides 
is part of the consensus sequence at which the spliceosome 
forms. Mutations of either of these dinucleotide sequences 
or of nearby nucleotides in the consensus sequence within 
the intron can result in splicing errors that inaccurately 
remove intron sequences from pre-mRNA. 

In intron 1 of the ß-globin gene, two separate 
mutations that substitute the guanine of the GT 
dinucleotide abolish normal splicing entirely in mutations 


(a) Mutations in promoter 
Promoter position 


B-globin gene mutants -101 -89 -88 -32-30 -29 


Wild-type promoter pi CING cla// CA ac ce (evry |] 


Promoter mutants | yl | |] 
-101 [CAcci [C Acaece] ATANA] 


-89 _//CACCC//CACAMICC//ATAAA ] 


-88 _//caccc//CACACIC//ATAAA | 


-32 _[/caccc//CACACCC//fATAAA | 


-30 //CACCC//CACACCC//ATIBAA | 


L L L 
l L l 
i L L 
L L L 
L L L 
L L L 


-29 _[/caccc//CACACCC//ATAMA | 


(b) Mutations in intron 1 
Exon-intron splice site 


Exon 1 Intron 1 spliced transcript 

Wild type //GC CIN nan A o0% 
Mutants _// 6c CRN) MATTIA j None 
|/[sc cH m a None 
J[eccha GTTG@TA// None 
|/{sc cRX§ em A Reduced 
|6 cch sommennay A Reduced 
_J|[c cch ema A Reduced 


1234567 
L 


Intron position 


Figure 12.3 Regulatory mutations of the human ß-globin 
gene. (a) These base-pair substitution mutations in the promoter 
reduce transcription of the gene. (b) These base-pair substitutions 
in intron 1 reduce or eliminate normal pre-mRNA splicing. 


Amount of normally 


that are known as splicing mutations (Figure 12.3b). 
Additionally, one base-pair substitution mutation of 
position 5 of intron 1 by itself also prevents the production 
of normally spliced mRNA. The translation of the 
abnormally spliced transcripts does not produce wild-type 
B-globin protein. Other base-pair substitution mutations 
in intron 1 result in production of a mixture of normally 
and abnormally spliced transcript and produce some wild- 
type B-globin protein. One of Mendel’s traits, the purple 
versus white flower phenotype, is caused by a splicing 
mutation. See Experimental Insight 12.1 for discussion. 


Cryptic Splice Sites Certain base-pair substitution 
mutations produce new splice sites that replace or 
compete with authentic splice sites during pre-mRNA 
processing. These newly formed splice sites are known 
as cryptic splice sites. Intron 1 of the human f-globin 
gene is 130 nucleotides in length. A base-pair substitution 
mutation that changes G to A at position 110 of intron 
1 creates an AG dinucleotide that is a cryptic splice site 
(Figure 12.4). The cryptic splice site is spliced in about 90% 
of the intron 1 3’ splicing events. This aberrant splicing 
leaves 19 additional nucleotides in the mature mRNA; 
these nucleotides have been removed in the other 10% of 
mature transcripts, which are spliced at the authentic 3’ 
splice site for intron 1. In Genetic Analysis 12.1, you can 
practice identifying types of mutations by the alterations 
they produce in polypeptides. 


Polyadenylation Mutations Processing of the 3’ end 
of eukaryotic mRNAs is initiated by the presence of a 
5’ AAUAAA 3’ polyadenylation signal sequence (see 
Section 8.4), and mutation of this sequence can block proper 
3’ processing of mRNA. One example of this mutation 
is found in a rare variant of the human a globin gene 
in which the DNA coding strand sequence is mutated 
from 5' AATAAA 3’ to 5’ AATAAG 3’. The A-T to G-C base 
substitution blocks recognition of the polyadenylation signal 
sequence, generates abnormal mRNA, and leads to a severe 
reduction in the amount of function a globin protein. 


Forward Mutation and Reversion 


Forward mutation, often identified simply as “mutation,” 
converts a wild-type allele to a mutant allele. In contrast, 
mutations identified as reverse mutations, or, more com- 
monly, as reversions, convert a mutation to a wild-type or 


Authentic splice site 


Position 100 110 120 
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near wild-type state. The mechanisms of base-pair substitu- 
tion described earlier are examples of processes that create 
mutation. Reversions can be caused by similar mecha- 
nisms. In one type of reversion, called a true reversion, 
the wild-type DNA sequence is restored to encode its 
original message by a second mutation at the same site or 
within the same codon (Figure 12.5a). Alternatively, rever- 
sion can occur by a second mutation elsewhere in the gene. 
Figure 12.5b illustrates an example of one such reversion— 
an intragenic reversion, which is a reversion that occurs 
through mutation elsewhere in the same gene. Here the 
initial mutation was caused by deletion of two base pairs, 
and the intragenic reversion is a compensatory insertion of 
two base pairs near the site of the initial mutation, restoring 
the allele to a near wild-type form. Figure 12.5c illustrates an 
example of a second-site reversion, produced by mutation 
in a different gene. In this case, the original mutation inacti- 
vates gene A and results in the loss of function of the major 
pigment-transporting protein in a flower. A minor pigment- 
transporting gene, B, remains active, transporting a small 
amount of blue pigment from gene C. The initial mutation 
produces a light-blue flower. The second-site reversion is 
a mutation of gene B that increases gene transcription and 
thus increases production of the pigment-transporting pro- 
tein. The mutation of gene B compensates for the mutation 
of gene A and restores the wild-type dark-blue flower phe- 
notype. Second-site mutations are also known as suppres- 
sor mutations because the second mutation, by restoring 
wild-type appearance, can be said to “suppress” the mutant 
phenotype generated by the first mutation. 


12.3 Gene Mutations May Arise from 
Spontaneous Events 


Spontaneous mutations arise in cells without being 
induced by exposure of DNA to a physical, chemical, or bio- 
logical agent capable of creating DNA damage. Spontaneous 
mutations arise primarily through errors during DNA rep- 
lication and through spontaneous changes in the chemical 
structure of nucleotide bases. 


DNA Replication Errors 


DNA replication has extraordinarily high fidelity. 
Replication errors resulting in base-pair mismatches 
between a template strand and a newly synthesized 


Figure 12.4 Cryptic splicing. Base-pair 
substitution of G- C to A-T at position 110 of 
intron 1 of the human -globin gene creates a 
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Intron 1 3' splice site 
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(a) True reversion 


DNA Wild type Mutation Revertant 
Coding strand 5’ II TA J>l Ti l>I Ac ll 3’ 
Template strand Ey ff _ AAT [—[LAag [—L GAG IL 5 


mRNA 5H y AA 3" 
tw Gs 


Polypeptide 


Base-pair substitution 
reverts the mutated 
codon to encode 

the wild-type (Leu) 
amino acid. 


Base-pair substitution 
creates a missense 
mutation. 


(b) Intragenic reversion 


DNA Wild type 
= 
Coding strand 5°] [TTA _ TGG TGT CCA [3 
| 


Template strand 3’ Jl AAT AAA [EOT ACC ACA GGT L 5 


Two base pairs 
are deleted. 
Frameshift nt? 


5] AMIN ANGNGMcmIncIenAy /13' 
3’ [AAT AAA TAC CAC AGG T// 5’ 


| AC | Two base pairs 
TG | are inserted. 
frameshift mutation 


Reverse 
S][TTA TTT ATG GTIMGT CCA /[3’ 
3 [AAT AAA TAC CACA Gat // 5 


| The additional mutation in a second location restores the reading frame. | 


(c) Second-site reversion 


Wild type Mutation Reversion 
Genotype A* Bt Ct A Bt Cc A BC 
Blue Blue Blue 
pigment pigment pigment 
Minor Minor Increased 
pigment- pigment- pigment- 
transport transport transport 
protein protein function 


Major pigment- Loss of transport Loss of transport 
transport protein function function 


Phenotype | | | 


v 7v 
Dark-blue Light-blue Dark-blue 
flower flower flower 


Figure 12.5 Reversion mutations. (a) This true reversion restores the wild-type amino acid 
sequence to the polypeptide. (b) This intragenic reversion reverts a frameshift mutation caused by a 
2-bp deletion by insertion of 2 bp at a nearby site in the gene. (c) Second-site reversion restores a near 
wild-type phenotype through a compensatory mutation of a second gene. 


strand of DNA occur at an approximate rate of 1 X 10° 
in wild-type Escherichia coli, and a similar accuracy 
rate is found in eukaryotic DNA replication. The over- 
all efficiency of DNA replication is attributable to the 
proofreading capabilities of DNA polymerases and to 
the operation of DNA base-pair mismatch repair sys- 
tems (see Section 12.5). 

An exception to the general accuracy of replica- 
tion, however, is observed in genomic regions containing 
short repetitive sequences whose number can be either 
increased or decreased by replication errors. Replication 
errors in such regions are another source of hotspots 
of mutation. The repeating DNA sequences are com- 
monly short, end-to-end repeats consisting of repeating 
sequences of the same two nucleotides (dinucleotide 
repeats), of the same three nucleotides (trinucleotide re- 
peats), or of longer repeating units. 

Mutations altering the number of DNA repeats occur 
by a process called strand slippage. In the mid-1960s, 


George Streisinger and his colleagues described the first 
known example of strand slippage, which generated frame- 
shift mutations caused by adding nucleotides in a gene of 
the bacteriophage T4. Streisinger proposed that strand 
slippage occurs when the DNA polymerase of the repli- 
some temporarily dissociates from the template strand 
as it moves across a region of repeating DNA sequence 
(Figure 12.6). He suggested that, during dissociation, a por- 
tion of newly replicated DNA forms a temporary double- 
stranded hairpin structure induced by the complementary 
base pairing of nucleotides in the loop. Reassociation of 
DNA polymerase and resumption of replication leads to 
re-replication of a portion of the repeat region, increasing 
the length of the repeat region in the daughter strand. 

In the past two decades, a number of strand slip- 
page mutations have been identified as the causes of 
various hereditary diseases in humans and other or- 
ganisms. The human diseases are classified as trinu- 
cleotide repeat disorders (Table 12.4). The wild-type 
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If IT IT If l IT l DNA segment 
CAG CAG CAG CAG CAG TC{[ 3' | containing six 
Cc ( OTETI ET CTN as CAG-triplet 

repeats 


I 
5" ][TAK CAG 


Strand separation 


(G//75’ Template strand 


Growing 
daughter strand 


Strand detachment and reattachment 
during synthesis of daughter strand 


Daughter strand 
slippage forms a 


54 TAA CAG CA hairpin loop. 


C Gl 
Complementary 


base pairs 


f Partial re-replication of 
Mismatched template strand, 
base pair producing 11 CAG repeats 


I 
5’ ][TAK CAG 


I] I] II l 
CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG TE 


E 
AG) as’ 


Conclusion: Strand slippage in regions of repeating DNA sequence leads to an altered number 


Figure 12.6 Strand slippage during 
DNA replication. 


of repeat elements. 
alleles of the genes in question normally contain a vari- expansion of the number of trinucleotide repeats be- 
able number of DNA trinucleotide repeats. On rare oc- yond the wild-type range results in a hereditary disor- 
casions, these gene regions undergo mutations through der. Most often the mutations block the production of 
strand slippage that cause the number of trinucleo- wild-type mRNA and reduce or eliminate the produc- 
tide repeats to increase. For each of these disorders, tion of wild-type protein. 


Table 12.4 Human Trinucleotide Repeat Disorders 


OMIM Repeat Principal Disease 

Disease Number Sequence Repeat Range Phenotype 
Normal Disease 

Fragile X syndrome 309550 CGG 6-50 200-2000 Mental retardation 
Friedreich ataxia 229300 GAA 6-29 200-900 Loss of coordination 
Huntington disease 143100 CAG 10-34 40-200 Uncontrolled movement 
Jacobsen syndrome 147791 CGG 11 100-1000 Growth retardation 
Myotonic dystrophy (type I) 160900 CTG 5-37 80-1000 Muscle weakness 
Spinal and bulbar muscular atrophy 313200 CAG 14-32 40-55 Muscle wasting 
Spinocerebellar ataxia (multiple forms) 271245 CAG 4-44 45-140 Loss of coordination 


GENETIC ANALYSIS 


PROBLEM In a mutant analysis a goal is often to identify the type of mutation that has occurred. 

In this problem, a fragment of a polypeptide with the wild-type amino acid sequence is given: 

BREAK IT DOWN: Use the wild-type amino acid 
sequence to determine the mRNA sequence, including 
all possible redundancies, as the starting point for 
mutant analysis. (Use the genetic code, p. 321; see also 
inside the front cover) 


Met-His—Ala-Trp-Asn-Gly-Glu-His—Arg 


The amino acid sequences of three mutants are shown below. 
For each mutant, identify the type of mutation that has occurred 


and specify how the mRNA sequence has been changed. BREAK IT DOWN: Identification of the 
å . A š mutations requires deducing each mutant 
Mutation 1: Met-His-Ala-Trp-Lys-Gly-Glu-His-Arg mRNA sequence and comparing it to the 


Mutation 2: Met-His-Ala wild-type mRNA sequence. (pp. 394-395) 
Mutation 3: Met-Met-Leu-Gly-Met-Ala-Glu-His-Arg 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic this problem addresses 1. This problem concerns mutations affecting the amino acid sequence of a 
and the nature of the required answer. gene. The type of change causing each mutation must be identified, and the 
effect of the mutation on mRNA must be described. 
2. Identify the critical information given in 2. The wild-type amino acid sequence and the corresponding portions of the 
the problem. mutant polypeptides are given. 
Deduce 
3) Determine the sequence of wild-type 6) The sequence of wild-type mRNA is, 
cee 5'-AuG CA'/, GCN UGG AAY/o GGN GAÊ/g CAY/oê/cGN-3' 
TIP: Use N if the position could be occupied | 
any nucleotide, “/, for the alternative purines, TIP: Use the genetic code in Figure 9.13 or in 
and "/c for alternative pyrimidines. Table B inside the front cover. 
Solve 
4. Compare each mutant sequence to the 4. Comparisons are as follows. 
wild-type polypeptide, and identify the Mutant 1: This is a missense mutation in which the mutant polypeptide has 
probable types of mutations. one amino acid changed from Asn to Lys. 
Mutant 2: This is a nonsense mutation in which a Trp codon is changed to a 
stop codon. 
Mutant 3: This mutant contains alterations of five consecutive amino acids, 
beginning with the second amino acid (His to Met). The wild-type sequence 
is restored beginning with the seventh amino acid (Glu). This mutant results 
from two compensatory frameshift mutations. The first alters the reading 
frame, and the second restores it. 
5. Determine the mRNA change producing 5. The wild-type (Asn) codon is AAY/ ç, and the mutant (Lys) codon is AA*/. 
the missense mutant. This change results from either a transition or a transversion mutation. 
6. Determine the mRNA change producing 6. The wild-type Trp (UGG) codon is changed to a stop codon. The change is 
the nonsense mutant. either UGG to UGA or UGG to UAG. In either case, this is a transition mutation. 
7. Determine the mRNA change producing 7. The appearance of Met in position 2 means the second codon of the 
the frameshift mutant. frameshift mutant is AUG. This change requires deletion of the first C of the 


wild-type sequence and means that U, not C, is present as the sixth nucleo- 
tide of the wild type. Beginning with Glu, the wild-type amino acid sequence 
is restored. This requires insertion of G immediately after the Ala codon. 


For more practice, see Problems 4, 9, and 32. Visit the Study Area to access study tools. MasteringGenetics™ 


ntane Nucleotide Base Chan f 
Spontaneous Nucleotide Base Changes Tautomers are structures that have the same composi- 


DNA nucleotide bases are organic chemical structures tion and general arrangement but a slight difference in 
that can occasionally convert, in what are called tauto- bonding and placement of a hydrogen. The generation 
meric shifts, to alternative structures known as tautomers. of a tautomer changes the three-dimensional structure 
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Figure 12.14 UV photoproducts. UV irradiation forms photoproducts from adjacent pyrimidines, 


distorting the double helix and potentially blocking replication. 


on the template strand, leaving the 3’ OH out of posi- 
tion as DNA polymerase attempts to catalyze the next 
phosphodiester bond. This occurrence activates the 
proofreading function of the DNA polymerase. 

More specifically, when it encounters thymines in a 
dimer on the template strand, DNA polymerase attempts 
to add complementary adenines to the nascent DNA 
strand. But the first adenine fails to form the necessary hy- 
drogen bonds, because the placement of its complemen- 
tary partner is distorted. In attempting to add the second 
adenine, DNA polymerase identifies the mispositioned 
3’ OH of the first adenine, initiates 5'-to-3’ proofread- 
ing activity, and then attempts to resume synthesis in the 
thymine dimer region—but with the same negative result. 
Continued repetition of these unsuccessful attempts to 
replicate across the thymine dimer causes replication to 
stall at this point. 


How does the replication process overcome the 
blockage caused by the presence of pyrimidine dimers? 
It circumvents the problem. Replication blockage by 
pyrimidine dimers induces reinitiation of DNA synthesis 
at an adjacent RNA primer site. This reinitiation of rep- 
lication potentially leaves gaps spanning dozens to hun- 
dreds of nucleotides in newly synthesized DNA strands, 
but the gaps are subsequently filled by translesion DNA 
synthesis, which is carried out by specialized bypass DNA 
polymerases (one in bacteria and several in eukaryotes) 
that can replicate across the gaps. These specialized DNA 
polymerases are more prone to replication error, however, 
because they lack proofreading ability. In fact, it is the 
absence of proofreading activity that allows these poly- 
merases to carry out replication across pyrimidine dimers. 
Replication can thus proceed, but at the risk of introducing 
mutations. We discuss the process further in Section 12.6. 
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Figure 12.15 The Ames test for potential mutagenicity of 
chemical compounds. 


very few mutations to accrue. However, a species’ survival 
depends on maintaining a delicate balance between muta- 
tion and repair. Since most mutations are deleterious to the 
organism, too many mutations may doom an organism and 
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Figure 12.16 Mutagenicity of aflatoxin B4 determined by 
the Ames test. Aflatoxin B4 induces a high rate of reversions in 
his” bacteria with base-pair substitution mutations (strain TA 
100), but not in frameshift mutants (strain TA 1538). 


ultimately affect survival of the species. On the other hand, 
too few mutations will limit the range of genetic variability 
and may hamper the species’ ability to evolve. 

Organisms must therefore strike a balance between the 
accumulation of mutations and repair of DNA damage be- 
fore mutations accrue. To manage this balance, organisms 
have evolved multiple repair mechanisms, and often these 
are partially redundant with regard to the lesions they iden- 
tify and repair. In broad terms, these damage repair pro- 
cesses fall into two categories: (1) those that directly repair 
DNA damage and restore it to its wild-type state; and (2) 
those that allow the organism to circumvent problems such 
as blocked DNA replication, which can occur when damage 
is not repaired but which leave the DNA damage in place. 


Direct Repair of DNA Damage 


We have already encountered the most direct way to 
repair DNA lesions and to reverse DNA damage before 
it causes mutation. This mechanism is proofreading by 
DNA polymerase (see Chapter 7), that identifies a base- 
pair mismatch, removes the erroneous DNA segment, 
and resynthesizes the sequence. Several other repair sys- 
tems also carry out direct repair of DNA damage. 


Mismatch Repair The proofreading that accompanies 
DNA replication is an efficient system that helps keep 
the mutation rate low. Still, some mismatched nucleotide 
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Figure 12.21 The p53 DNA damage repair pathway. 


activation of the apoptotic pathway. Thus, if the p53- 
induced pause in the cell cycle goes on too long, the 
pathway senses that there is a large amount of DNA 
damage that cannot be quickly repaired. The long pause 
allows the apoptotic pathway to go forward, and the cell 
undergoes programmed cell death. 


DNA Damage Repair Disorders 


DNA damage repair disorders, resulting from mutations 
in genes that participate in the repair of DNA damage 
or in the signaling or initiating of damage repair, cause 
an organism to be highly sensitive to chemical mutagens 
and to radiation. Such disorders greatly increase the 
organism’s susceptibility to cancers caused by mutagen 
exposure. We return to this theme in the Case Study that 
concludes this chapter, where we discuss a connection 
between mutations of p53 and the occurrence of cancer 
and the role of transmission of p53 mutation in the hu- 
man familial cancer syndrome known as Li-Fraumeni 
syndrome (OMIM 151623). Table 12.6 lists some human 
mutation repair disorders that are associated with signifi- 
cantly elevated risks of specific types of cancer. 

Research conducted since the 1990s on gene muta- 
tions in cancer has combined with cancer genomics, ge- 
nome sequence analysis of cancers, to offer a new way to 
test for the inheritance of gene mutations that may signif- 
icantly increase a person’s lifetime risk of cancer. Several 
individual gene tests are available, but recently a group of 
medical researchers at the University of Washington as- 
sembled and tested a genome sequence-based breast and 
ovarian cancer analysis that examines 24 genes associated 
with the diseases. The test panel, called BROCA, promises 
to offer individuals at risk for breast cancer an unprec- 
edented opportunity to assess their risk. Experimental 
Insight 12.2 describes BROCA. 


Table 12.6 Selected Human Mutation Repair Disorders 


Disorder and OMIM 


Number Description 

Ataxia telangiectasia Mutation of the ATM gene and absence of ATM protein. Poor coordination (ataxia), red marks on 
(208900) the face (telangiectasia), increased sensitivity to X-rays and other radiation, high cancer risk. 
Breast-ovarian cancer Mutation of BRCA1. Defective DNA repair and increased susceptibility to breast and ovarian 
(604370) cancer. 

Li-Fraumeni syndrome Mutation of p53 and defective p53 pathway. High cancer risk. 

(151623) 

Nonpolyposis colon cancer Defective base-pair mismatch repair caused by mutation of any one of seven different genes. 
(120435) High risk of colon cancer. 

Trichothiodystrophy Mutations of any one of five gene mutations causing increased sensitivity to oxidative damage. 
(601675) Mental retardation, dwarfism, skin and hair abnormalities, and increased cancer risk. 
Xeroderma pigmentosum Defective excision repair resulting from the mutation of any one of seven UV damage repair 


(278700) genes. Extreme sensitivity to UV-induced damage and high skin cancer risk. 
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Experimental Insight 12.2 


BROCA: Cancer Genomics: Genome Sequencing to Evaluate Cancer Risk 


The average woman reading this textbook has about 
a 12% chance of developing breast cancer in her lifetime. 
Approximately 90% to 95% of the cases that develop will 
be so-called sporadic cases, meaning there were no known 
hereditary factors that increased the person's breast cancer 
risk. The remainder of cases, however, will occur because the 
woman inherited a gene mutation that predisposed her to 
breast cancer. 

Since their discovery in the 1990s, two genes, BRCA1 
and BRCA2, have been at the forefront of genetic testing 
for inherited mutations that predispose to a woman’s risk 
of hereditary breast and ovarian cancer. Certain mutations 
of BRCA1 and BRCA2 are very strongly associated with 
an almost 75% lifetime risk of breast and ovarian cancer, 
whereas other mutations of these genes appear to carry 
much lower predisposition risks. Commercial testing for 
mutations of these genes was, until 2013, controlled by 
a single company and was limited to the most common 
mutations linked to increased cancer risk. Many women 
with breast or ovarian cancer who are members of families 
with a high hereditary breast cancer risk test negative for 
the BRCA7 or BRCA2 mutations screened by the commercial 
test, suggesting that either untested BRCA7 or BRCA2 muta- 
tions or else mutations of other genes are responsible for 
increased hereditary risk. A 2013 U.S. Supreme Court deci- 
sion revoked the patent held by the company on BRCA7 and 
BRCA2 testing and opened the way for wider application 
and use of genetic testing for mutations of genes that may 
increase lifetime cancer risk. 

A new genome sequence-based genetic test known as 
the BROCA test is designed to examine the sequences of all 
the genes known to contribute to breast cancer risk. The test 
is named after the French physician, surgeon, and anatomist 
Pierre Paul Broca (1824-1880), whose contributions to medical 


12.6 Proteins Control Translesion 
DNA Synthesis and the Repair of 
Double-Strand Breaks 


The repair mechanisms described to this point are able to 
repair DNA damage, but not all DNA damage is repaired 
in this way. Damage that escapes repair before the initia- 
tion of DNA replication has the potential to block repli- 
cation. Circumventing this potential blockage requires 
mechanisms that can permit replication to progress de- 
spite the presence of damage that is potentially muta- 
genic. In addition, events that lead to the breakage of one 
or both DNA strands present unique challenges to organ- 
isms. The repair of certain kinds of strand breakage can 
take place in an error-free manner that does not intro- 
duce mutation. Other types of strand breakage, however, 
are “error-prone,” meaning that repair of the damage may 


science include the first descriptions of families in which there 
appears to be inherited susceptibility to breast cancer. 

At present, mutations of 24 genes, including BRCA7 and 
BRCA2, that are suspected to contribute to inherited sus- 
ceptibility to breast cancer are examined in the BROCA test. 
Mutations of any one of these genes could potentially in- 
crease a woman’s lifetime risk of cancer to levels ranging 
from a few percent higher to several times that of an average 
woman in the population. Tomas Walsh, Mary-Claire King, 
and numerous colleagues have collaborated to develop the 
BROCA test, which uses advanced genome sequencing to 
fully sequence all 24 genes implicated in increased breast 
cancer risk. Complete gene sequencing allows detection of all 
point mutations, all repeat-sequence copy number variants, 
and all insertion and deletion mutations. 

Published reports in 2010 and 2011 and an additional pre- 
liminary report in 2013 outline the effectiveness of BROCA in 
detecting mutations linked to breast cancer risk in high-risk 
families that contain members who have previously tested 
negative for one of the commercially tested BRCA7 or BRCA2 
mutations. The 2013 report identified 149 mutations of 18 
genes in 191 breast cancer families. Through complete gene 
sequencing, BROCA identified 66 families in which there were 
BRCA1 or BRCA2 mutations that were not detected by the 
commercial test. In 125 additional families, mutations of one 
of the genes other than BRCA7 or BRCA2 were detected. 

The comprehensive testing provided by BROCA may prove 
pivotal in identifying women at increased risk of breast cancer 
due to hereditary predisposition. Women in this situation can 
then be offered several options before cancer appears. BROCA 
is among the first of a coming wave of genome-based genetic 
tests that herald an era in which certain kinds of medical 
treatment will be personalized to take into account individual 
genome sequence differences. 


itself introduce mutations. It may not immediately be 
obvious why repair mechanisms that are prone to intro- 
ducing additional errors have evolved. After all, the point 
of DNA repair systems is just that—to repair damage so as 
to maintain the integrity of the genome. This conundrum 
is explained by the fact that error-prone repair mecha- 
nisms are activated only in instances of widespread DNA 
damage that would otherwise prevent the completion of 
DNA replication andmight cause cell death. 


Translesion DNA Synthesis 


In response to widespread DNA damage, molecular ac- 
tivities in the cell may direct the cell to apoptosis. The 
activity of the p53 protein in eukaryotic cells can lead to 
this outcome. E. coli cells that undergo extensive damage 
might also die, but there is a second repair mechanism 
that can be activated in E.coli in response to massive 
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DNA damage. This repair system, called SOS repair, 
has been known for decades but has only recently been 
understood at the molecular level. The system takes its 
name from the maritime phrase “save our ship,” used 
when sinking was imminent. In the past, SOS repair was 
described as a last-ditch effort on the part of a heavily 
damaged bacterial cell to replicate its DNA and divide 
before succumbing to DNA damage. Recent research 
demonstrates that SOS repair is accomplished by activat- 
ing specialized DNA polymerases in a process known as 
translesion DNA synthesis. This short-lived process al- 
lows DNA replication by alternative polymerases across 
lesions that block the action of DNA polymerase III (pol 
III), the main DNA-replicating polymerase in E. coli. 

Translesion DNA synthesis is performed by transle- 
sion DNA polymerases, also called bypass polymerases. 
Bypass polymerases operate differently from pol III in 
several respects. First, bypass polymerases are able to 
replicate across DNA lesions that stall pol II. This ability 
is accounted for by the second difference distinguishing 
bypass polymerases, the absence of proofreading. In other 
words, bypass polymerases do not have 3'-to-5' exo- 
nuclease capacity and are unable to remove newly added 
nucleotides that fail to hydrogen bond with the tem- 
plate strand nucleotide. Due to their lack of proofread- 
ing capability, the third distinguishing feature of bypass 
polymerases is that they are prone to making replication 
errors. Finally, bypass polymerases synthesize only short 
segments of DNA; they fall off the template strand after 
synthesizing a small number of nucleotides. From these 
distinguishing features, molecular biologists conclude 
that bypass polymerases are used to complete replication 
that would otherwise be blocked. This comes at the price, 
however, of potentially introducing new mutations. 

The SOS system in E. coli operates through a spe- 
cialized bypass polymerase identified as polymerase V 
(“polymerase five”), or pol V. When pol II stalls at dam- 
aged DNA, RecA protein coats the template strand ahead 
of the lesion that is already bound by single-stranded 
binding protein (SSB). Recall that SSB coats the single 
DNA strands separated ahead of the replication fork (see 
Figure 7.14). The RecA protein in the DNA-RecA-SSB 
complex is an active form that also activates transcription 
of several genes, including pol V. Pol V displaces poly- 
merase III, synthesizes a short portion of the daughter 
strand across the DNA lesion, and is then replaced by pol 
II, which resumes its normal replication activity. 

Eukaryotic genomes utilize a similar mechanism for 
translesion DNA synthesis. In eukaryotes, however, by- 
pass polymerases are always present in cells, so the system 
of regulating their access to DNA is quite different. The 
regulatory mechanism guiding the choice of polymerase 
decides which polymerase binds to PCNA, the eukary- 
otic sliding clamp. When eukaryotic replication stalls at 
a DNA lesion, a protein called Rad6 that is always pres- 
ent at the replication fork adds a ubiquitin (Ub) group 
to PCNA. This process, called ubiquitination, normally 


targets a protein for destruction. On PCNA, however, 
ubiquitination merely causes an alteration of conforma- 
tion, giving the bypass polymerase a strong affinity for 
ubiquitinated PCNA. In this process, bypass polymerase 
displaces normal DNA polymerase and carries out trans- 
lesion synthesis of DNA. As in the SOS system, the use 
of bypass polymerases in eukaryotic cells is error prone 
because the enzyme lacks proofreading capability. 


Double-Strand Break Repair 


A common feature of the DNA repair mechanisms we have 
examined is the use of DNA polymerase and a template 
strand of DNA to guide the repair, replacement, or syn- 
thesis of DNA. These repair systems are effective as long 
as one strand of DNA is intact and can serve as a template. 
But what happens if both strands of DNA are damaged ina 
manner that does not provide a template strand for strand 
repair? Such damage is a frequent consequence of exposure 
to X-rays and certain types of oxygen radicals. The damage 
caused by these agents breaks bothstrands of DNA, leaving 
lesions that are known as double-strand breaks. Because 
they can cause chromosome instability and incomplete 
replication of the genome, double-strand breaks are poten- 
tially lethal to cells and elevate the risk of cancer and the 
chance of chromosome structural mutations. 

To protect organisms from the unpleasant conse- 
quences of double-strand breaks, two mechanisms have 
evolved to carry out double-strand break repair. The 
first is an error-prone repair process known as nonhomol- 
ogous end joining that repairs double-strand breaks occur- 
ring before DNA replication. The second is an error-free 
process called synthesis-dependent strand annealing that 
repairs double-strand breaks occurring after the comple- 
tion of DNA replication. 


Nonhomologous End Joining If a double-stranded 
break damages a eukaryotic chromosome during G, of 
the cell cycle, replication of the damaged chromosome 
is blocked. Considering that DNA polymerases, even 
bypass polymerases, require a template strand to direct 
synthesis of a daughter strand, it is clear that a double- 
strand break is incompatible with the completion of 
replication. One repair alternative that allows cells to 
reacquire their capacity to fully replicate their genome is 
nonhomologous end joining (NHE)J), although its four- 
step process for repairing double-strand breaks inevitably 
leads to mutation (Figure 12.22). 

In the first step, double-strand breaks are recognized by 
a protein complex containing the proteins PKcs, Ku70, and 
Ku80. This complex attaches to each of the broken ends of 
the DNA duplex. The complex then trims back (resects) the 
free ends of each broken strand. Resection leaves blunt ends 
on each side of the break. Finally, the blunt ends are ligated 
by a specialized ligase called ligase IV (“ligase four”). 

Completion of NHE) produces an intact DNA duplex 
and allows replication across the repaired region in the 
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Figure 12.22 Nonhomologous end joining. NHEJ is an 
error-prone system that rejoins DNA strands following a 
double-stranded break. 


upcoming replication cycle, but the repair is often imper- 
fect because resection removes nucleotides that cannot 
be replaced. For this reason, NHE) is error prone. Yet, 
as potentially damaging as this process is, its outcome is 
superior to the alternatives suffered by cells that are un- 
able to repair double-strand breaks, and it prevents more 
extensive loss from degradation of unprotected ends. 
Mutations can be generated, however, when nucleotides 
are lost from transcribed genes. 


Synthesis-Dependent Strand Annealing In eukaryotes, 
once DNA replication is complete, each chromosome 
is composed of two identical sister chromatids. Double- 
stranded breaks at this stage can be repaired by exploiting 
the intact sister chromatid to repair the damaged 
chromatid in an error-free repair process known as 
synthesis-dependent strand annealing (SDSA). 

As shown in Figure 12.23, a double-stranded break 
(DSB) affects one sister chromatid; the other chromatid 
is undamaged. SDSA begins with trimming of one of the 
broken strands. This is followed by attachment of the pro- 
tein Rad51 to the broken region to form a nucleoprotein 
filament. Rad51 binds to the strands and facilitates the 
invasion of the intact chromatid by the resected end of a 
strand from the sister chromatid. This strand invasion 
process displaces one strand of the duplex and creates a 
displacement (D) loop. DNA replication within the D 
loop synthesizes new DNA strands from intact template 
strands. The sister chromatids are reformed by dissocia- 
tion and annealing of the nascent strand to the other side 
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Figure 12.23 Synthesis-dependent strand annealing 
(SDSA). 


of the break. By accomplishing the removal of DNA in the 
immediate vicinity of a double-stranded break and the re- 
placement of the excised DNA with a duplex identical to 
that in the sister chromatid, SDSA carries out error-free 
repair of double-stranded breaks. 


12.7 DNA Double-Strand Breaks 
Initiate Homologous Recombination 


Homologous recombination is the exchange of ge- 
netic material between homologous molecules of DNA. 
All organisms undertake homologous recombination. 
In bacteria, homologous recombination occurs during 
events such as conjugation and as a consequence of 
the repair of double-strand breaks. Archaea undertake 
homologous recombination under circumstances simi- 
lar to those in bacteria. In eukaryotes, whereas a limited 
amount of homologous recombination takes place during 
mitosis, recombination between homologous chromo- 
somes is essential in prophase I of meiosis. In eukaryotes, 


418 CHAPTER 12 Gene Mutation, DNA Repair, and Homologous Recombination 


homologous recombination during meiosis is initiated by 
controlled double-strand DNA breaks. 

Proper chromosome segregation during meiosis de- 
pends on the occurrence of recombination between ho- 
mologous chromosomes. Without it, homolog synapsis 
does not take place, and errors are likely to occur during 
chromosome segregation. This leads to nondisjunction and 
to gametes with the wrong number of chromosomes (we 
discuss the consequences of these events in Chapter 13). 

Cell biologists and geneticists have interpreted and 
understood the genetic consequences of recombination for 
more than a century, but an understanding of homologous 
recombination and meiotic recombination at the molecular 
level has been more elusive. Though initially discovered in 
the early 20th century through the work of Thomas Hunt 
Morgan and his colleagues, who detected recombinant 
chromosomes in gametes, homologous recombination 
could not be studied on a molecular level until the 1950s. In 
the decade following the determination of the double heli- 
cal structure of DNA, numerous researchers attempted to 
construct likely models of homologous recombination. In 
more than 60 years since work began in earnest to describe 
the molecular mechanism of homologous recombination, 
many models have been proposed, and modification of 
models has been continuous. Molecular biologists con- 
tinue to adjust models of recombination to match observa- 
tions, but two salient points are now clear. First, meiotic 
recombination is a genetically controlled process initiated 
by enzymes that produce double-stranded DNA breaks; 
and second, the molecular mechanism of homologous 
recombination is closely related to the processes that repair 
double-stranded DNA breaks. 


The Holliday Model 


The first viable molecular model of meiotic recombination 
was proposed by Robin Holliday in 1964 and was based on 
the study of homologous recombination in E. coli. Known 
as the Holliday model, it offered a plausible scheme for 
meiotic recombination by hypothesizing that spontane- 
ously generated single-stranded breaks in one chroma- 
tid led to invasion of a homologous molecule. Holliday’s 
scheme for breaking and rejoining DNA strands suggested 
that some encounters between homologous chromosomes 
would produce crossovers whereas others would not. 

The original Holliday model ultimately proved to be 
too simplistic and has been superseded by more accurate 
models of meiotic recombination. The more recent mod- 
els rely on some of the features of the Holliday model but 
incorporate new knowledge and steps. Perhaps the most 
important features distinguishing the current model of 
meiotic recombination from the original Holliday model 
are, first, that meiotic recombination is now known to 
be initiated by double-stranded DNA breaks and, second, 
that the double-stranded breaks initiating meiotic recom- 
bination are generated in a programmed manner by the 
activity of a specialized enzyme. 


The Bacterial RecBCD Pathway 


Homologous recombination in all organisms shares many 
features in terms of the mechanical processes involved 
as well as the homologies of proteins that are active in 
recombination. The first, and still the most detailed, mo- 
lecular description of homologous recombination per- 
tained to E. coli. This homologous recombination model 
describes the action of several proteins that are critical to 
initiating and completing homologous recombination. 
Known as the RecBCD pathway, the system of 
homologous recombination in bacteria relies on the 
occurrence of DNA double-strand breaks to initiate the 
process. Double-strand DNA breaks attract the protein 
RecA. Bacterial RecA is a homolog of the eukaryotic 
and archaeal protein Rad51, which performs a similar 
function in those organisms. The multiprotein complex 
known as RecBCD then attaches to the region of a bac- 
terial chromosome with bound RecA, and this complex 
promotes single-strand invasion and the formation of D 
loops. The process is highly similar in appearance to the 
strand invasion and D-loop formation we saw in SDSA. 
RecBCD activity is followed by binding of RuvAB and 
RuvC proteins. The Ruv complex completes homologous 
recombination between the bacterial DNA molecules. 


The Double-Stranded Break Model 
of Meiotic Recombination 


The bacterial RecBCD pathway of homologous recombina- 
tion was the starting point for the study of meiotic recom- 
bination in eukaryotes since numerous protein homologies 
have been identified. The outline of the current model 
of meiotic recombination was proposed in 1983 by Jack 
Szostak, Terry Orr-Weaver, Rodney Rothstein, and Franklin 
Stahl. Their model was the first to predict that the creation 
of double-stranded breaks controlled by the activity of a spe- 
cific protein was the foundation of meiotic recombination. 
The accumulated experimental evidence has confirmed this 
view, and the research has added major new details to the 
original proposal by Szostak and his colleagues. 

Among these new findings is the determination that 
the double-strand breaks that precede meiotic recom- 
bination are under precise protein control. This is in 
contrast to a more generalized and diverse process of gen- 
erating double-strand breaks in bacterial DNA. 

The bacterial RecBCD pathway leading to homolo- 
gous recombination is very closely related to the recom- 
bination pathway in archaea and to mitotic and meiotic 
recombination in eukaryotes. Table 12.7 lists several of the 
critical gene homologies between bacteria and eukaryotes 
and archaea. The eukaryotic and archaeal systems appear 
to have stronger homology than do the bacterial and ar- 
chaeal systems. In part for this reason, the eukaryotic and 
archaeal recombination proteins carry the same names. 

In the current model, meiotic recombination is initi- 
ated by the protein Spoll (“Spo eleven”) that was first 
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Table 12.7 Recombination Protein Homology 


Recombination 


Bacterial Eukaryotic/Archaeal 
Step Protein Protein’ 
DSB introduction Not specific Spo11 


Homologous DNA pairing RecA Rad 51 + Dmc1 


and strand invasion 


Strand invasion RecBCD Rad52 and Rad59 
Branch migration RuvAB Unknown 
Holliday junction RuvC Rad51 and XRCC 


resolution 


7 Eukaryotic and archaeal recombination proteins have strong homology and 
carry the same names. 


discovered in yeast and is now known to exist in homolo- 
gous form in all eukaryotes (Foundation Figure 12.24, @). 
Note that bacteria lack a homolog to Spo11 (see Table 12.7), 
so while homologous recombination in bacteria is tied to 
repair of double-strand breaks, the breaks apparently occur 
at random or through the action of non-specific proteins. 

Spoll is a dimeric protein that generates slightly 
asymmetric double-strand cuts in one chromatid. The 
proteins Mrx and Exol associate with Spo11, and after 
Spoll degrades, Mrx, assisted by additional proteins. 
Mrx and associated proteins are homologs of RecBCD 
helicase and nuclease, resects the single strands @. Mrx 
and associated proteins are homologs of RecBCD helicase 
and nuclease. Two RecA homolog proteins, Rad51 and 
Dmcl, join at the trimmed region@. Rad51 and Dmcl 
are RecA homologs. This protein complex helps form a 
strand-exchange assemblage, facilitating strand invasion 
and formation of a D loop Q, O. 

The invading strand pairs with the complementary 
strand in the D loop. Outside the D loop, the two strands 
that appear to cross over one another form a Holliday 
junction, an interim structure proposed in the original 
Holliday model. Notice that there is also a heteroduplex 
region, containing two complementary strands of DNA 
that originated in different homologs. Also identified as 
heteroduplex DNA, these regions are a molecular sig- 
nature of homologous recombination. Because the two 
strands of the heteroduplex DNA originate in different 
homologs, there may be mismatched base pairs between 
them. In other words, if heterozygosity is present in the 
DNA sequences forming a heteroduplex region, one or 
more base pairs will be mismatched in the heteroduplex 
DNA. We discuss the implications of this situation in the 
following section. 

Extension of the invading strand and DNA synthesis 
within the broken strand are guided by intact template 
strands @, and are assisted by additional proteins, in- 
cluding Rad52 and Rad59, that are RecBCD homologs. @. 
At this point, a second heteroduplex region has formed. 
The 3’ end of the invading strand next connects with 


the 5’ end of a strand segment that was initially part of 
the invading strand O, to form a second Holliday junc- 
tion. Now the nonsister chromatids of the recombining 
chromosomes are interconnected to one another by the 
presence of double Holliday junctions (DHJs): The 
recombining chromosomes contain DHJs and two het- 
eroduplex regions. 


Holliday Junction Resolution 


The recombinational steps just described take place in 
prophase I of meiosis, and any connections between 
homologous chromosomes must be resolved in pro- 
phase, long before the homologs attach to spindle fibers 
in metaphase. Cutting and reconnecting single strands 
of interconnected homologous chromosomes resolves 
crossing over. In bacteria, this process is accomplished 
by the RuvAB complex and RuvC. In eukaryotes, the 
Rad51c-XRCC3 complex, which is homologous to RuvC 
and RecAB, accomplishes resolution of Holliday junc- 
tion connections between homologous chromosomes. 
Archaea have homologous proteins to accomplish this 
step of recombination. The best current evidence finds 
these archaeal proteins to have closer homology to the 
eukaryotic proteins than to the bacterial proteins. 

Two Holliday junction resolution patterns, called 
same sense resolution and opposite sense resolution 
(Q@ and @in Foundation Figure 12.24), complete crossing 
over and disengage homologs so they can be separated dur- 
ing anaphase I. Same sense resolution involves either two 
north-south (NS) resolution cuts or two east-west (EW) 
resolution cuts of DNA strands to separate the homo- 
logs (see Foundation Figure 12.24). When the connection 
between homologs is resolved by two NS or EW cuts, the 
flanking markers (A; and Bı and A, and By) do not recom- 
bine. As a consequence, recombination of those genes is not 
produced, although heteroduplex regions are present. This 
resolution occurs only rarely. Far more common is resolu- 
tion in which one Holliday junction region is resolved by 
a NS cut and the other by an EW cut. The resulting chro- 
mosomes are recombinant and carry A; and By or A, and 
Bı. These recombinations are detectable among progeny, 
where they are counted as recombinants between the genes. 


12.8 Gene Conversion Is Directed 
Mismatch Repair in Heteroduplex DNA 


Our final topic in this chapter is gene conversion, a process 
of so-called directed DNA sequence change that occurs 
by base-pair mismatch repair within heteroduplex DNA. 
These base-pair mismatches can occur when DNA sequence 
is heterozygous in a heteroduplex region created during 
meiotic recombination. In gene conversion, the “directed” 
change is base-pair mismatch repair that switches the nu- 
cleotide sequence of one allele to that of another allele that 


FOUNDATION FIGURE 12.24 


Material Molecular Model of Meiotic Recombination 


Meiotic crossing over and double Holliday junction formation 


Meiotic recombination 

B, A, | diagrammed between these 
B, A, | nonsister chromatids of 
homologous chromosomes 


B, A; 
(1) Spo11 creates double-strand break in one DNA duplex. (2) Enzymatic digestion 5'— 3’ by Mrx creates 
single-stranded segments. 
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O Strand invasion creates one D loop and the first (6) Strand extension by DNA polymerase displaces D loop 
heteroduplex region. Rad52, Rad59, and other proteins DNA, which pairs with complementary single-stranded 
participate. DNA to form the second heteroduplex region. 
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DNA synthesis 
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Resolution of Holliday junction crossovers 


B, A, 
Meiotic recombination 
B, A, | diagrammed between these 
Bz Az | nonsister chromatids of 
homologous chromosomes 
B, A; 


a 
@ Same Sense Resolution 
East-west cut Heteroduplex region 
el B §— 4 A; 
5 Se 3’ 
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A2 B, A2 


East-west cut 


Heteroduplex 
region 


Same sense resolution produces offset heteroduplex 
regions but no recombination of flanking genes. This 
form of resolution occurs infrequently. 


© XRCC3 and Rad51c assemble 
strand-exchange nucleoprotein filaments. 


(4) The strand-exchange filaments promote strand invasion. 


XRCC3 + Rad51c 
B, 1 B, A, 
yO E 5 GO ?' E 
3 3 5 | 5! 
— —— > — 
34 5 34 3 5 
54 3’ 54 3! 
B> Az B, A; 
Q Strand extension and ligation fills the single-stranded gap (8) Double Holliday junctions form after the nick is sealed; 
in the strand paired with D loop DNA. chromatids contain offset heteroduplexes. 


Strand extension Holliday 
and ligation Heteroduplex region junction 
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junction region 
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Meiotic recombination 
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Opposite sense resolution is very common. It 
generates recombination of flanking genes 
and creates offset heteroduplex regions. 
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is already present because the organism is heterozygous in 
the portion of the genome where heteroduplex DNA forms. 
In contrast to mutation, which can change one allele into 
any other allele, gene conversion can only switch one allele 
to another allele already present in a heterozygous genotype. 

Gene conversion is most readily detected in fungi that 
form an ascus, a sack of haploid spores that are the products 
of meiotic division. For example, we identified that for fungi 
with the genotype a‘a, the ratio of these alleles in spores in 
an eight-cell ascus is expected to be equal (4:4) (see Figure 
5.23). Gene conversion changes that ratio by switching one 
or more alleles from one form to another: either a* to a, 
or a to a’. The result is an aberrant ratio of spores in an 
eight-cell ascus, commonly 5:3 or 6:2 instead of 4:4. Since 
gene conversion is strictly limited to conversions from one 
allele to the alternative form in a heterozygous genotype, it 
is distinct from mutation, in which an allele can be altered 
to almost an infinite variety of forms. Similarly, in organ- 
isms producing a four-cell ascus, a 2:2 ratio is expected for a 
heterozygote, and any other ratio is an aberrant ratio. 

Figure 12.25 illustrates the formation of heterodu- 
plex DNA for alleles A; and A» that differ by substitu- 
tion of one base pair. Allele A; carries a C-G base pair at 
the differing position, and A» carries an A-T base pair. 
Mismatches between G and A, and between C and T, are 
highlighted in the heteroduplex regions. 
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Figure 12.25 Heteroduplex DNA. (a) A segment of allele 
A; contains a C -G base pair, whereas allele Aj contains an 
A-T base pair at the same location. Segregation produces a 
2:2 ascus. (b) Crossover between homologous chromosomes 
generates heteroduplex DNA containing G-A and C-T 
base-pair mismatches (in red) between the otherwise 
complementary strands. 
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Repairing both base-pair mistakes to 
A; (A-T base pair) creates 3:1 ascus. 
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Repairing both base-pair mistakes to 
A, (CG base pair) creates 3:1 ascus. 


Repair option 3 


TARIGIGIG] Aberrant 
— 2:2 Ascus 
TAGGC “Repair 
ATCAG G>T ATCAG 


—_—_ 


T n ITI g 
TA GAC c Repair TAGHC Ab 
aTe TG ACG > 


TAGE TAGEC | 
ATCAG eee 


TAGTC | 


Repairing base-pair mistakes in opposite 
directions results in an aberrant 2:2 ascus. 


Figure 12.26 Example mismatch repair and gene conver- 
sion patterns in a four-celled ascus. 


In a four-celled ascus, the repair of base-pair mis- 
matches in heteroduplex DNA results in three aberrant 
ratios or patterns of spores (Figure 12.26). In repair option 
1, both mismatches repair by converting the sequence 
to that of A». Conversely, in repair option 2, both mis- 
matches repair to produce A}. In each case, gene conver- 
sion has taken place, and the resulting asci contain an 
aberrant 3:1 ratio of alleles. In repair option 3, the pattern 
of mismatch repair produces an ascus with an aberrant 2:2 
ratio in which A; and A; are in alternating order instead 
of the like alleles being side by side as expected normally. 

The pattern of mismatch repair also determines the 
aberrant ratios in the eight-celled ascus by gene conver- 
sion. Figure 12.27 shows three options for the repair of 
base-pair mismatches. In option 1, both mismatch repairs 
favor a single allele (A; in this case) and produce an as- 
cus containing an aberrant 6:2 ratio. A similar aberrant 


Mitosis 


Meiosis 


Repair option 1 6:2 Ascus 


Repairing both base-pair mistakes 
from A,—A, creates a 6:2 ascus. 


Repair option 2 Meiosis Mitosis 5:3 Ascus 


Repairing of one mistake but no 
repair of the other creates a 5:3 ratio. 
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No repair of base-pair mistakes in the 
heteroduplex region creates an aberrant 4:4 ratio. 


Figure 12.27 Mismatch repair and gene conversion in an eight-celled ascus. 


aberrant 5:3 ratio. Two different aberrant 5:3 ratios, 5 A 7:3 
A» and 5 Az3 Aj, are possible, depending on the favored 
allele in the single mismatch repair. In repair option 3, no 
mismatch repair takes place. The spores are arrayed in a 
3:1:1:3 pattern, a distribution called an aberrant 4:4 ratio. 


6:2 ratio producing an ascus containing 6 A> and 2 A; 
gametes occurs if both mismatches are repaired in favor 
of A» rather than A;. In repair option 2, just one rather 
than both base-pair mismatches are repaired before the 
DNA replication cycle, resulting in an ascus containing an 


CASE STUDY 


Li-Fraumeni Syndrome Is Caused by Inheritance of Mutations of p53 


Numerous studies of human cancers identify p53 as the most 
commonly mutated gene in cancer cells. From the pivotal 
role p53 and its protein product play in cells, it is easy to see 
why cells lacking p53 function are abnormal. In the absence 
of functional p53 protein, DNA damage goes undetected and 
cells progress through G4 of the cell cycle to S phase with the 
DNA damage present. Similarly, homozygous inactivation of 
p53 interferes with the initiation of apoptosis in cases where 
cells have high levels of damage. By itself, homozygous muta- 
tion of p53 does not cause cancer. Other mutations must be 
present to cause the rapid cell proliferation and other abnor- 
malities that characterize cancer. Still, homozygous p53 muta- 
tion can play a pivotal role in the accumulation of additional 
mutations that lead to cancer development. 


In 1969, Frederick Li and Joseph Fraumeni encountered a 
family in which cancer ravaged each generation (Figure 12.28). 
This family stood out for its pattern of cancer that was consis- 
tentwith an autosomal dominant mode of transmission and be- 
cause many of the cancer cases occurred decades earlier than are 
typical for these types of cancer in the general population (i.e., 
breast cancers appeared in the 30s in affected family members as 
opposed to the 60s in members of the general population). Both 
of these features are hallmarks of cancer-prone families in which 
an inherited germ-line mutation increases individual susceptibil- 
ity to cancer. Interestingly, however, unlike most cancer-prone 
families, in which one or two types of cancers predominate, 
this family studied by Li and Fraumeni had many types of can- 
cer, including soft tissue sarcomas, breast cancers, brain cancer, 
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LHO 


i O [J a 9 


5 Other malignant neoplasms, 
including brain cancer 
and leukemia 


Figure 12.28 Li-Fraumeni syndrome. Inherited mutations 


of p53 greatly increase susceptibility to sarcoma, breast cancer, 
brain cancer, leukemia, and other cancers. 


osteosarcoma, adrenocortical cancer, and leukemia. After Li and 


Fraumeni’s description, other families with similar patterns of 


mixed cancers were identified. This inherited cancer-prone con- 
dition was designated Li-Fraumeni syndrome 1 (LFS1; OMIM 


151623). Study of LFS1 sparked revolutionary investigations of 


cancer biology and genetics that identified several genes that are 


SUMMARY 


12.1 Mutations Are Rare and Occur at Random 


Mutations occur at random in genomes. 


| 

E Mutations result from damage done to DNA. 

E Mutation frequencies are low in all organisms. 

E Mutational hotspots are genes or regions where mutations 


occur much more often than average. 


12.2 Gene Mutations Modify DNA Sequence 


Base-pair substitution mutations can be either transitions 
or transversions. 

Base-pair substitutions can change one amino acid of the 
polypeptide, can create a new stop codon, or can leave 

the polypeptide unchanged. 

Frameshift mutations result from the insertion or deletion of 
one or more base pairs that shift the mRNA reading frame 
during translation. 

Regulatory mutations alter gene transcription or pre-mRNA 
splicing. 

Forward mutation alters a wild-type allele to mutant form, 
and reversion changes a mutant back to wild-type or near 
wild-type form. 


12.3 Gene Mutations May Arise from 
Spontaneous Events 


DNA replication errors can substitute base pairs, and strand 
slippage can modify the number of repeats of a DNA sequence. 
Tautomeric shifts of nucleotide base structure can induce 
spontaneous base-pair substitution mutations. 


( MasteringGenetics™ 


frequently mutated in cancer cells, as well as investigations of the 
inheritance of mutations that increase cancer susceptibility. 

In 1997, the first evidence of the molecular defect in LFS1 
emerged with the identification of abnormalities of the p53 
gene in approximately 70% of LFS1 family members with can- 
cer. The mutations were discovered in germ-line cells, mean- 
ing that one parent passed a mutant copy of p53 in sperm or 
egg. At conception, the fertilized eggs were heterozygous, 
and as they developed, all cells carried one mutated and one 
wild-type copy of p53. Individual cells of mutation carriers 
become homozygous for p53 mutation by the occurrence of a 
somatic mutation that alters the wild-type allele. The resulting 
homozygous mutant cells do not produce normal p53 pro- 
tein, and they are unable to properly regulate the cell cycle or 
entry into apoptosis. Cancers develop in individuals without 
functioning p53 through the accumulation of somatic muta- 
tions of other genes. The specific types of cancer that develop 
depend on which genes are mutated and on the tissues or cell 
types in which the mutations occur. 

Following identification of the role of germ-line p53 muta- 
tions in LFS1, inherited mutations of other DNA repair genes 
have been identified as increasing the susceptibility to cer- 
tain cancers in families. For example, mutations of BRCA7 and 
BRCA2 that interact with p53 can increase susceptibility to 
breast and ovarian cancer. Other inherited mutations of DNA 
repair genes that increase susceptibility to cancers include the 
disorders listed in Table 12.6. 


For activities, animations, and review quizzes, go to the Study Area. 


Different kinds of spontaneous changes in nucleotide struc- 
ture can result in mutation of DNA sequence by base-pair 
mismatching. 


12.4 Mutations May Be Induced by Chemicals 
or lonizing Radiation 


Mutagenic chemicals interact in characteristic reactions 
with DNA nucleotides and generate specific mutations. 
Chemical compounds may create mutations by acting as nu- 
cleotide base analogs, adding or removing side groups from 
nucleotides, or intercalating into DNA. 

Energy in the ultraviolet range and higher (shorter in wave- 
length) is mutagenic. Ultraviolet radiation induces the for- 
mation of photoproducts that lead to base-pair substitution 
mutations. 

The Ames test identifies mutagenic chemical compounds by 
testing for increased reversion rates in auxotrophic bacteria 
exposed to a test compound in the presence of detoxifying 
enzymes from the eukaryotic liver. 


12.5 Repair Systems Correct Some DNA Damage 


Direct repair of DNA lesions removes damaged nucleotides 
and prevents mutation. 

Mismatched DNA nucleotides, photoproducts induced by 
UV radiation, and modified nucleotide side chains are re- 
moved by direct repair. 

Nucleotide excision repair and UV repair remove segments 
of DNA single strands containing damaged nucleotides and 
direct new synthesis to fill the resulting single-stranded gap. 


| Genetically controlled systems monitor the genome and 
regulate DNA repair. 


12.6 Proteins Control Translesion DNA Synthesis 
and the Repair of Double-Strand Breaks 


E SOS repair, controlled by the RecA protein, is a specialized 

process activated during replication in bacteria in response 

to widespread DNA damage. 

| Translesion DNA synthesis uses bypass polymerases to 
complete replication when damage is present. 

1 Nonhomologous end joining repairs double-strand DNA 
breaks occurring before DNA replication. 


breaks occurring after the completion of replication. 


12.7 DNA Double-Strand Breaks Initiate 1 

Homologous Recombination ; 

f Homologous recombination is controlled by the RecBCD 
pathway in bacteria. In eukaryotes, meiotic recombination 
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is initiated through the activity of Spo11 that regulates the 
production of double-strand breaks. 

In meiotic recombination, strand invasion and new DNA 
synthesis form heteroduplex DNA in both homologous 
chromosomes. 

Heteroduplex DNA contains base-pair mismatches if DNA 
sequences are heterozygous. 

DNA strands forming double Holliday junctions are cut and 
rejoined to different homologs before their separation in meiosis. 
Resolution of double Holliday junctions generates heterodu- 
plex DNA and can produce recombinant or nonrecombinant 
chromosomes. 


| Synthesis-dependent strand annealing repairs double-strand 12.8 Gene Conversion Is Directed Mismatch 
Repair in Heteroduplex DNA 


Gene conversion occurs by the repair of base-pair 
mismatches in heteroduplex DNA. 

Gene conversion in a four-celled or eight-celled ascus 
generates aberrant ratios of spores that differ from the 
expected 2:2 or 4:4 ratios. 
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Chapter Concepts For answers to selected even-numbered problems, see Appendix: Answers. 
1. Identify two general ways chemical mutagens can alter 3. Using the adenine-thymine base pair in this DNA sequence 
DNA. Give examples of these two mechanisms. .. GCTC... 
.. .CGAG... 


2. Nitrous acid and 5-BrdU alter DNA by different mecha- 
nisms. Identify each mechanism and describe how each 
compound creates mutation. 


a. Give the sequence after a transition mutation. 
b. Give the sequence after a transversion mutation. 
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The partial amino acid sequence of a wild-type protein is 
... Arg—Met—Tyr—Thr—Leu—Cys-Ser... 


The same portion of the protein from a mutant has the 
sequence 


... Arg—Met—Leu-Tyr—Ala—Leu—Phe... 


a. Identify the type of mutation. 

b. Give the sequence of the wild-type DNA template 
strand. Use “/g if the nucleotide could be either purine, 
T/c if it could be either pyrimidine, N if any nucleotide 
could occur at a site, or the alternative nucleotides if a 
purine and a pyrimidine are possible. 


Thymine is usually in its normal, common form. Diagram 
the base pair that would result if a tautomeric shift occurs 
just before DNA replication. 


Ultraviolet (UV) radiation is mutagenic. 

a. What kind of DNA lesion does UV energy cause? 

b. How do UV-induced DNA lesions lead to mutation? 

c. Identify and describe two DNA repair mechanisms that 
remove UV-induced DNA lesions. 


Researchers interested in studying mutation and mutation 
repair often induce mutations with various agents. What 
kinds of gene mutations are induced by 

a. Chemical mutagens? Give two examples. 

b. Radiation energy? Give two examples. 


The effect of base-pair substitution mutations on protein 
function varies widely from no detectable effect to the 
complete loss of protein function (null allele). Why do the 
functional consequences of base-pair substitution vary so 
widely? 


The two DNA and polypeptide sequences shown are 

for alleles at a hypothetical locus that produce different 
polypeptides, both five amino acids long. In each case, the 
lower DNA strand is the template strand: 


allele Aj: 5’... ATGCATGTAAGTGCATGA. . .3' 
3’... TACGTACATTCACGTACT...5’ 
A, polypeptide N-Met-—His—Val—Ser—Ala—C 
allele A»: 5’. . .ATGCAAGTAAGTGCATGA.. .3’ 
3’... TACGTTCATTCACGTACT...5’ 
A» polypeptide N-Met-Gln-Val-Ser-Ala-C 


Based on DNA and polypeptide sequences alone, is there 
any way to determine which allele is dominant and which 
is recessive? Why or why not? 


In numerous population studies of spontaneous muta- 
tion, two observations are made consistently: (1) most 
mutations are recessive, and (2) forward mutation is more 
frequent than reversion. What do you think are the likely 
explanations for these two observations? 


Two different mutations are identified in a haploid strain 
of yeast. The first prevents the synthesis of adenine by a 
nonsense mutation of the ade-1 gene. In this mutation, 

a base-pair substitution changes a tryptophan codon 
(UGG) to a stop codon (UGA). The second affects one of 
several duplicate tRNA genes. This base-pair substitution 
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14. 


15. 


mutation changes the anticodon sequence of a tRNAT!P 

from 3’-ACC-5’ to 3’-ACU-5’. 

a. Do you consider the first mutation to be a forward mu- 
tation or a reversion? Why? 

b. Do you consider the second mutation to be a forward 
mutation or a reversion? Why? 

c. Assuming there are no other mutations in the genome, 
will this double-mutant yeast strain be able to grow on 
minimal medium? If growth will occur, characterize the 
nature of growth relative to wild type. 


Many human genes are known to have homologs in the 
mouse genome. One approach to investigating human 
hereditary disease is to produce mutations of the mouse 
homologs of human genes by methods that can precisely 
target specific nucleotides for mutation. 


a. Numerous studies of mutations of the mouse homo- 
logs of human genes have yielded valuable informa- 
tion about how gene mutations influence the human 
disease process. In general terms, describe how and 
why creating mutations of the mouse homologs can 
give information about human hereditary disease 
processes. 

b. Despite the homologies that exist between human and 
mouse genes, some attempts to study human hereditary 
disease processes by inducing mutations in mouse 
genes indicate there is little to be learned about human 
disease in this way. In general terms, describe how 
and why the study of mouse gene mutations might fail 
to produce useful information about human disease 
processes. 


Answer the following questions concerning the accuracy of 
DNA polymerase during replication. 


a. What general mechanism do DNA polymerases use 
to check the accuracy of DNA replication and identify 
errors during replication? 

b. Ifa DNA replication error is detected by DNA poly- 
merase, how is it corrected? 

c. Ifa replication error escapes detection and correction, 
what kind of abnormality is most likely to exist at the 
site of replication error? 

d. Identify two mechanisms that can correct the kind of 
abnormality resulting from the circumstances identified 
in part (c). 

e. Ifthe kind of abnormality identified in part (c) is not 
corrected before the next DNA replication cycle, what 
kind of mutation occurs? 

f. DNA mismatch repair can accurately distinguish be- 
tween the template strand and the newly replicated 
strand of a DNA duplex. What characteristic of DNA 
strands is used to make this distinction? 


Apert syndrome is a human autosomal dominant condi- 
tion that affects development of the head, hands, and feet. 
In a survey of 322,182 consecutive births in Ireland, two 
new cases of Apert syndrome were identified. What is the 
mutation rate of this gene per gamete? 


Polydactyly is a human autosomal dominant condition that 
produces extra fingers and toes. Studies of hundreds of fami- 
lies with polydactyly have determined that penetrance for the 
dominant allele is 70%. Hospital-based surveys of live births 


16. 


find that 1 in 40,000 infants has a new case of polydactyly. Use 
this information to estimate the mutation rate of the gene. 


The table shown lists the approximate new mutation rates 
for three autosomal dominant human diseases. 


Mutations per 


Trait 10° Gametes 
Retinoblastoma (tumor of the retina) 20 
Achondroplasia (statural dwarfism) 80 
Neurofibromatosis (tumor of nervous tissue) 220 
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18. 


19. 


a. Ina series of 50,000 consecutive live births recorded in 
a large metropolitan area, how many new cases of each 
disease are expected? 

b. Identify two possible molecular reasons why the rate 
of new mutations causing neurofibromatosis is more 
than 10 times greater than the mutation rate causing 
retinoblastoma. 


A 1-mL sample of the bacterium E. coli is exposed to 
ultraviolet light. The sample is used to inoculate a 
500-mL flask of complete medium that allows growth of 
all bacterial cells. The 500-mL culture is grown on the 
benchtop, and two equal-size samples are removed and 
plated on identical complete-medium growth plates. Plate 
1 is immediately wrapped in a dark cloth, but plate 2 is 
not covered. Both plates are left at room temperature 
for 36 hours and then examined. Plate 2 is seen to con- 
tain many more growing colonies than plate 1. Thinking 
about DNA repair processes, how do you explain this 
observation? 


A strain of E. coli is identified as having a null mutation of 
the RecA gene. What biological property do you expect to 
be absent in the mutant strain? What is the molecular basis 
for the missing property? 


Define gene conversion and contrast it with gene mutation. 


Application and Integration 


29. 


30. 


Following the spill of a mixture of chemicals into a small pond, 
bacteria from the pond are tested and show an unusually high 
rate of mutation. A number of mutant cultures are grown 
from mutant colonies and treated with known mutagens to 
study the rate of reversion. Most of the mutant cultures show 
a significantly higher reversion rate when exposed to base 
analogs such as proflavin and 2-aminopurine. What does this 
suggest about the nature of the chemicals in the spill? 


A geneticist searching for mutations uses the restriction 
endonucleases Smal and Pvull to search for mutations 
that eliminate restriction sites. Smal will not cleave DNA 
with CpG methylation. It cleaves DNA at the restriction 
digestion sequence 


l 


5'-CCC GGG-3' 
3'-GGG CCC-5' 


t 


20. 


21. 


22. 


23. 


24. 


25: 


26. 


27. 


28. 
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Some homologous recombination events produce gene 
conversion. Is homologous recombination a mutational 
event? Explain why or why not. 


What is heteroduplex DNA, and why does it form? What 
is the relationship between heteroduplex DNA and gene 
conversion? 


Is heteroduplex DNA always an outcome of homologous 
recombination? Why or why not? 


A strain of yeast producing a four-celled ascus is hetero- 
zygous for the wild-type allele Ala-B and the mutant allele 
ala-b. The wild-type allele carries an A-T base pair and 
the mutant allele a G-C base pair at a site that is part of a 
heteroduplex region. Identify the events that produce the 
following kinds of asci. 

a. 3 Ala-B:1 ala-b 

b. 3 ala-b: 1 Ala-B 

c. 2 ala-b:2 Ala-B 


Gene conversion is relatively easy to detect in four-cell and 
eight-cell asci of fungi, where ratios such as 3A:1a or 5a:3A 
indicate that gene conversion has taken place. Why is gene 
conversion much more difficult to detect in multicellular 
eukaryotes? 


If homologous recombination did not occur, what conse- 
quences would result? 


In this chapter, three features of genes or of DNA 
sequence that contribute to the occurrence of mutational 
hotspots were described. Identify those three features and 
briefly describe why they are associated with mutational 
hotspots. 


Briefly compare the production of DNA double-strand 
breaks in eukaryotes versus in bacteria. 


During mismatch repair, why is it necessary to distinguish 
between the template strand and the newly made daughter 
strand? Describe how this is accomplished. 


For answers to selected even-numbered problems, see Appendix: Answers. 


31. 


Pvullis not sensitive to CpG methylation. It cleaves DNA 
at the restriction sequence 


l 


5'-CAG CTG-3' 
3'-GTC GAC-5' 


t 


a. What common feature do Smal and PvuII share that 
would be useful to a researcher searching for mutations 
that disrupt restriction digestion? 

b. What process is the researcher intending to detect with 
the use of these restriction enzymes? 


A wild-type culture of haploid yeast is exposed to ethyl 
methanesulfonate (EMS). Yeast cells are plated on a com- 
plete medium, and 6 colonies (colonies numbered 1 to 6) 
are transferred to a new complete medium plate for further 
study. Four replica plates are made from the complete 
medium plate to plates containing minimal medium or 
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33. 


34. 
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minimal medium plus one amino acid (replica plates num- 
bered 1 to 4) with the following results: 


Complete 
medium 
1 3 
1 f5 
Replica plate 
Plate 1 Plate 2 Plate 3 Plate 4 
1 1 3 1 1 
4 4 og 4 5 
Minimal Minimal Minimal Minimal 
+ Histidine + Arginine + Leucine 


a. Identify the colonies that are prototrophic (wild type). 
What growth information leads to your answer? 

b. Identify the colonies that are auxotrophic (mutant). 
What growth information leads to your answer? 

c. Identify any colonies that are his , arg , leu . 

d. For colonies 1, 3, and 5, write “+” for the wild-type syn- 
thesis and “—” for the mutant synthesis of histidine and 
leucine. 

e. Are there any colonies for which genotype information 
cannot be determined? If so, which colony or colonies? 


A fragment of a wild-type polypeptide is sequenced 
for seven amino acids. The same polypeptide region is 
sequenced in four mutants. 


Wild-type N...Thr—His—Ser—Gly—Leu—Lys—Ala...C 


polypeptide 

Mutant 1 N...Thr—His—Ser—Val—Leu-Lys—Ala...C 
Mutant 2 N...Thr—His—Ser—C 

Mutant 3 N...Thr—Thr—Leu—Asp—C 

Mutant 4 N...Thr—Gln—Leu-Trp—Ie—Glu-Gly...C 


a. Use the available information to characterize each mutant. 

b. Determine the wild-type mRNA sequence. 

c. Identify the mutation that produces each mutant 
polypeptide. 


Experiments by Charles Yanofsky in the 1950s and 1960s 
helped characterize the nature of tryptophan synthesis 

in E. coli. In one of Yanofsky’s experiments, he identified 
glycine (Gly) as the wild-type amino acid in position 211 

of tryptophan synthetase, the product of the trpA gene. He 
identified two independent missense mutants with defec- 
tive tryptophan synthetase at these positions that resulted 
from base-pair substitutions. One mutant encoded arginine 
(Arg) and another encoded glutamic acid (Glu). At position 
235, wild-type tryptophan synthetase contains serine (Ser), 
but a base-pair substitution mutant encodes leucine (Leu). 
At position 243, the wild-type polypeptide contains gluta- 
mine, and a base-pair substitution mutant encodes a stop 
codon. Identify the most likely wild-type codons for posi- 
tions 211, 235, and 243. Justify your answer in each case. 


Common baker’s yeast (Saccharomyces cerevisiae) is 
normally grown at 37°C, but it will grow actively at 
temperatures down to approximately 20°C. A haploid 


culture of wild-type yeast is mutagenized with EMS. 
Cells from the mutagenized culture are spread on a com- 
plete-medium plate and grown at 25°C. Six colonies (1 to 
6) are selected from the original complete-medium plate 
and transferred to two fresh complete-medium plates. 
The new complete plates (shown below) are grown at 
25°C and 37°C. Four replica plates are made onto mini- 
mal medium or minimal plus adenine from 

the 25°C complete-medium plate. The new plates are 
grown at either 25°C or 37°C, as indicated below. 


25°C 37°C 
i 
Complete 2 : 6 
medium 
l 
Replica plate 
ih ws ie ws 
3 he 3 3 
6 5 6 6 5 6 
Minimal Minimal Minimal Minimal 
+ Adenine j + Adenine i 
25°C 37°C 


a. Which colonies are prototrophic and which are auxo- 
trophic? What growth information is used to make 
these determinations? 

b. Classify the nature of the mutations in colonies 1, 2, and 5. 

c. What can you say about colony 4? 


35. The two gels illustrated below contain dideoxynucleotide 
DNA-sequencing (see Section 7.5) information for a seg- 
ment of wild-type and mutant DNA corresponding to the 
N-terminal end of the protein. The start codon and the 
next five codons are sequenced. 


Wild type Mutant 
A T C G A T C G 
p © 
— — ® 


a. Write the DNA sequence of both alleles, including 
strand polarity. 
b. Identify the template and nontemplate strands of DNA. 


c. Write out the mRNA sequences encoded by each tem- 
plate strand, and underline the start codons. 

d. Determine the amino acid sequences translated from 
these mRNAs. 

e. What is the cause of the mutation? 


36. Alkaptonuria is a human autosomal recessive disorder 


37. 


caused by mutation of the HAO gene that encodes the 38. 


enzyme homogentisic acid oxidase. Restriction mapping 
of the HAO gene region reveals four BamHI restric- 
tion sites (B1 to B4) in the wild-type allele and three 
BamHI restriction sites in the mutant allele. BamHI 
utilizes the restriction sequence 5’- GGATCC- 3’. The 
BamHI restriction sequence identified as B3 is altered 
to 5'-GGAACC- 3’ in the mutant allele. The mutation 
results in a Ser-to-Thr missense mutation. Restriction 
maps of the two alleles are shown below, and the binding 
sites of two molecular probes (probe A and probe B) are 
identified. 


kb 3.0 25 4.0 39. 
B1 B2 B3 B4 
Wild type l l 
Mutant l l l 
LJ Lo 
Probe A B 


DNA samples taken from a mother (M), father (F), and two 
children (C1 and C2) are analyzed by Southern blotting 

of BamHI-digested DNA. The resulting autoradiograph is 
illustrated below. 


40. 
kb M F C1 C2 
3.0 | — — — 


a. Using A to represent the wild-type allele and a for 
the mutant allele, identify the genotype of each fam- 
ily member. Identify any family member who is 
alkaptonuric. 

b. Ina separate figure, draw the autoradiograph patterns 
for all the genotypes that could be found in children of 
this couple. 

c. Explain how the DNA sequence change results in a 
Ser-to-Thr missense mutation. 


In an experiment employing the methods of the Ames test, 
two his’ strains of Salmonella are used. Strain A contains a 
base substitution mutation, and Strain B contains a frameshift 
mutation. Four plates are prepared to test the mutagenicity 
of the compound ethyl methanesulfonate (EMS). Plate 1 is a 
control plate with Strain A and S9 extract but no EMS. Plate 2 
is also a control plate and contains Strain B and S9 extract but 
no EMS. Plate 3 contains Strain A along with S9 extract and 
EMS, and Plate 4 contains Strain B, S9 extract, and EMS. 
a. Characterize the expected distribution of colony growth 
on the four plates. Defend your growth prediction for 
each plate. 
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b. What event is being detected by growth of a colony on 
any of the four plates? 

c. Why is the S9 extract added to each of the plates? 

d. Suppose the compound being tested was proflavin 
instead of EMS. Would this change the Ames test 
results? Explain why or why not. 


Using your knowledge of DNA repair pathways, choose the 
pathway that would be used to repair the following types of 
DNA damage. Explain your reasoning. 


a. A change in DNA sequence caused by a mistake made 
by DNA polymerase during replication 

b. Heavily damaged bacterial DNA 

A thymine dimer induced as a result of UV exposure 

d. A double-strand break that occurs just after replication 
in an actively dividing cell 

e. A double-stranded break that occurs during G; and 
prevents completion of DNA replication 

f. A cytosine that has been deaminated to uracil 


Ataxia telangiectasia (OMIM 208900) is a human inherited 
disorder characterized by poor coordination (ataxia), red 
marks on the face (telangiectasia), increased sensitivity to 
X-rays and other radiation, and an increased susceptibil- 
ity to cancer. Recent studies have shown that this disorder 
occurs as a result of mutation of the ATM gene. Propose 

a mechanism for how a mutation in the ATM gene leads 
to the characteristics associated with the disorder. Be sure 
to relate the symptoms of this disorder to functions of the 
ATM protein. Further, explain why DNA repair mecha- 
nisms cannot correct this problem. 


lg 


Two haploid strains of fungus are fused to form a diploid 
that produces eight-celled asci. Fungus strain A has the 
genotype + adel his2, and strain B is a ++. The three genes 
are linked and occur in the order given. 

a. The alleles at the A gene locus are determined in an 
ascus, and the order is aaaa++++. Write the geno- 
type for all three genes that you expect to find most 
commonly. 

b. One ascus from the diploid is of the following type: 

+ adel his2 

+ adel his2 

+ adel his2 

+ adel his2 

adel + 


a 
a 
a 


a 


Explain the events that produced this ascus. 
c. One ascus from the diploid is of the following type: 


+ his2 
+ his2 
+ adel his2 
+ adel his2 
- adel his2 
+ adel his2 


Explain the events that produced this ascus. 
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Chromosome Aberrations 
and Transposition 


Chromosome translocations are mutations that rearrange chromosome 

structure. This electronmicrograph shows two pairs of homologous 

chromosomes that have exchanged segments and must form a tetrava- 
lent structure involving the four chromosomes in order to synapse their 
homologous regions during prophase l. 


omething interesting is happening to the mice on 
Madeira, a tiny island off the western coast of Portugal: 
They are in the process of differentiating into two species! 
Madeira, about 20 miles long and 8 miles wide, has steep 
volcanic mountains running down the middle that form a 
barrier to easy mouse migration. The common house mouse 
(Mus musculus) was introduced to Madeira by sailors in the 
1400s. Today, Madeira has two distinct populations of mice, 
one on either side of the central mountain range. 
In addition to the mountain range separating these two 
populations, each has also undergone multiple chromosome 
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fusions that have reduced their diploid number. The 
usual chromosome number for Mus musculus is 20 
pairs (2n = 40). On Madeira, however, one popula- 
tion has 2n = 22, and the other has 2n = 24. Because 
each population has a different chromosome num- 
ber, interpopulation hybrids are sterile. Such hybrids 
carry 23 chromosomes (11 from one parent and 
12 from the other) and therefore cannot form viable 
gametes. This is an example of reproductive isolation 
that can lead to speciation based on differences in 
chromosome structure and chromosome number. 
Variation and evolution at the chromosome 
level are genomic in scope—that is, they potentially 
alter the content of the genome, changing interac- 
tions between homologous chromosomes in meiosis 
and limiting the possibility of reproduction between 
organisms with chromosomal differences. This chap- 
ter addresses two distinct categories of chromosome 
change. The first consists of alterations of chromosome 
number and chromosome structure known collectively 
as chromosome aberrations. The second category 
of chromosome change is chromosome alteration by 
transposition, the movement of DNA elements within 
the genomes of organisms. Chromosome aberra- 
tions and transposition are examples of mutation at 
the chromosome level. In addition, transposition is a 
biological source of mutation as well as a source of 
additional DNA sequence that can increase the size of 
genomes. Both chromosome aberrations and trans- 
position contribute to evolution and speciation by 
reorganizing and reshaping the content of genomes. 


13.1 Nondisjunction Leads to Changes 
in Chromosome Number 


In Section 3.2, we discussed the connection between 
Mendel’s two laws of heredity and the disjunction of 
homologous chromosomes and sister chromatids during 
meiosis. In the discussion that follows, we focus on non- 
disjunction (mentioned briefly in Section 3.3) as a process 
of failed chromosome and sister chromatid disjunction 
that can result in abnormalities of chromosome number. 
The changes in chromosome number we describe in the 
following paragraphs exert their effects primarily by adding 
or removing one or more chromosomes from the normal 


complement in a nucleus. Such changes are mutations that 
add or remove large numbers of genes. In animal species, 
but less so in plant species, aneuploidy almost always alters 
the phenotype, and can have an effect on the development 
and reduce fertility and viability of the aneuploid organism. 


Euploidy and Aneuploidy 


The number of chromosomes contained in a nucleus 
and the relative size and shape of each chromosome are 
species-specific characteristics, but neither parameter is 
directly associated with the complexity of the organism 
(Table 13.1). Chromosome number varies widely among 
species, though closely related species tend to have similar 
numbers. 

With a few unusual exceptions, the number of chromo- 
somes is the same for males and females of a species, and the 
number of chromosomes in nuclei of normal cells is a mul- 
tiple of the haploid number (n), the number in a single set of 
chromosomes. Regardless of whether the total chromosome 
number is 2n (diploid), 37 (triploid), or a higher multiple 
of n, it is described as a euploid number of chromosomes 
if it is a whole-number multiple of the haploid number. If 
cells contain a number of chromosomes that is not euploid, 
the chromosome number is aneuploid. Aneuploidy occurs 
when one or more chromosomes are lost or gained relative 
to the normal euploid number. Chromosome nondisjunc- 
tion is a principal cause of aneuploidy. 


Chromosome Nondisjunction 


The term chromosome nondisjunction, or simply nondis- 
junction, applies to the failure of homologous chromosomes 
or sister chromatids to separate as they normally do during 


Table 13.1 Chromosome Number in Selected 


Animal Species 


Diploid Chromosome 


Species Number (2n) 
Carp (Cyprinus carpio) 104 
Cat (Felis catus) 38 
Chicken (Gallus domesticus) 78 
Chimpanzee (Pan troglodytes) 48 
Cow (Bos taurus) 60 
Dog (Canis familiarus) 78 
Frog (Rana pipiens) 26 
Fruit fly (Drosophila melanogaster) 8 
Horse (Equus caballus) 64 
Human (Homo sapiens) 46 
Mouse (Mus musculus) 40 
Rat (Rattus norvegicus) 42 
Rhesus monkey (Macaca mulatta) 42 
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cell division. Nondisjunction can occur in somatic cells or 
in germ-line cells, with the result that daughter cells of the 
division will have the wrong number of chromosomes. If a 
single pair of homologous chromosomes fails to properly 
disjoin in a somatic cell during mitotic cell division, one of 
the resulting daughter cells carries an extra chromosome 
(2n + 1), and the other is missing a chromosome (27 — 1). 

In animals, mitotic cells that contain the wrong num- 
ber of chromosomes may suffer reduced viability in com- 
parison to cells that have a normal, diploid number of 
chromosomes. The poor survival of these cells usually 
limits their number in organisms, although cells with ab- 
normal numbers of chromosomes are common in cancer, 
where other genetic changes play a major role in cell sur- 
vival and proliferation. 

In contrast to the limited circumstances under which 
changes to chromosome number may be maintained in 
animal cells, plants apparently have substantially more 
tolerance for changes in chromosome number, and it is 
not unusual to find plant strains with more than two cop- 
ies of each chromosome. We describe this situation in 
more detail in a later section. 

Nondisjunction in germ-line cells produces aneu- 
ploid gametes—reproductive cells that have one or more 
extra or missing chromosomes—which can lead to the 
production of aneuploid fertilized eggs. Meiotic nondis- 
junction can occur in either meiosis I or II and most often 
affects just a single homologous pair or a single pair of 
sister chromatids. Meiosis I nondisjunction is the failure 
of homologous chromosomes to separate. It results in 
both homologs moving to a single pole. One second- 
ary gametocyte contains both chromosomes, and the 
other contains neither chromosome (Figure 13.1). These 
gametocytes contain aneuploid chromosome numbers of 
n + 1andn — 1 (assuming only one chromosome pair is 
affected). Meiosis II usually proceeds normally even when 


Figure 13.1 Meiosis | nondisjunc- Meiosis | 
tion. Homologous chromosomes fail 

to disjoin in meiosis I, and all resulting 

gametes are aneuploid. Fertilization 


by a normal haploid gamete produces 


(2n+ 1) or monosomic (2n — 1). 


Secondary 
gametocytes 


meiosis I is aberrant, and its completion sends the sister 
chromatids to different gametes. The four resulting gam- 
etes each contain an aneuploid number of chromosomes. 
Union of an aneuploid gamete with a normal haploid 
gamete (shown in the figure) results in a fertilized egg 
with an aneuploid number of chromosomes that will be 
either trisomic (27 + 1), having three of one of the chro- 
mosomes rather than a homologous pair, or monosomic 
(2n — 1), having just a single copy of one of the chromo- 
somes rather than a homologous pair. 

If nondisjunction occurs in meiosis I, it typically 
follows a normal meiosis I. As a result, both secondary ga- 
metocytes contain the haploid number of chromosomes 
(Figure 13.2). Since these are separate cells, they indepen- 
dently divide during meiosis II; thus, if nondisjunction 
occurs, only one of the secondary gametocytes will be 
affected. Among the four resulting gametes, two are nor- 
mal because normal disjunction took place during each 
meiotic division. The other two gametes are aneuploid: 
one contains n + 1 chromosomes and the other n — 1 
chromosomes. Trisomic or monosomic fertilized eggs are 
produced when one of these aneuploid gametes unites 
with a normal gamete at fertilization. 


Gene Dosage Alteration 


In 1913, at about the same time Calvin Bridges was 
demonstrating the chromosome theory of heredity by 
examining nondisjunction in fruit flies (see Section 3.3), 
Albert Francis Blakeslee and John Belling reported the 
phenotypic consequences of aneuploidy in the diploid 
(2n = 24) jimson weed (Datura stramonium), in which 
12 chromosome pairs are identified as A to L. Blakeslee 
and Belling identified 12 phenotypically distinct lines of 
trisomic Datura, one for each of the chromosome pairs 
(Figure 13.3). 


Fertilization (with Fertilized 


Gametes anormal gamete) eggs 


NI 

ii Dt aipa . | A 

fertilized eggs that are trisomic d\ (n+), c) A \a 
Al A qa = ~ . \ 


(2n) Primary 
gametocyte 


(n-1) 
(n-1) 


Trisomic (2n + 1) 


Monosomic (2n - 1) 
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Meiosis II Secondary Fertilization (with Fertilized Figure 13.2 Meiosis Il nondisjunc- 
gametocytes Gametes a normal gamete) eggs tion. Sister chromatid disjunction 
fails in meiosis Il. Normal fertilization 
Q\) c) of the resulting gametes generates 
trisomy, monosomy, or normal 
s AI diploidy at A Ioh 
A mA (n+1) (n) 
Trisomic (2n + 1) 
~ 


This result suggests that chromosome number is a 
factor in phenotype. In the years that followed Blakeslee 
and Belling’s report, other studies documented that aneu- 
ploidy causes severe phenotypic consequences in nearly all 
animal species and that it affects the phenotype of many 
plant species. The abnormalities associated with aneu- 
ploidy result from changes in gene dosage, the number 
of copies of a gene in the genome. Aneuploidy changes 
the dosage of all the genes on the affected chromosome. 
In a diploid organism where two copies of a gene, on a 
homologous pair of chromosomes, generate 100% of gene 
dosage, a monosomic mutant has just one gene copy and 
just 50% of normal gene dosage for each gene on the chro- 
mosome. In contrast, a trisomic mutant has three copies 
and 150% of normal gene dosage for each of the genes on 
the chromosome. 

Changes in gene dosage lead to an imbalance of gene 
products from the affected chromosome relative to unaf- 
fected chromosomes, and this imbalance is at the heart of 
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Figure 13.3 The appearance of the seed head in wild-type 
diploid and in four trisomic lines of jimson weed (Datura 
stramonium). 
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alterations of normal development and the production of 
abnormal phenotypes. Most animals are highly sensitive to 
changes in gene dosage, and their developmental biology, 
especially within the nervous system, does not proceed 
normally in the presence of gene dosage imbalance. In 
contrast to the potential for developmental disruptions 
due to aneuploidy in animals, gene dosage changes are 
more easily tolerated in many species of plants, owing in 
part to their distinct developmental programs. 


Aneuploidy in Humans 


Humans are enormously sensitive to the changes in gene 
dosage and almost all human aneuploidies are incom- 
patible with life. Theoretically, there are potentially 24 
different kinds of trisomy in humans—one for each auto- 
some, and one each for the X and Y chromosomes—and 
an equal number of potential monosomies. Yet only 
autosomal trisomies of chromosomes 13, 18, and 21, 
and no autosomal monosomies, are seen with any mea- 
surable frequency in newborn human infants. Multiple 
forms of sex-chromosome trisomy are detected with 
some frequency at birth, however, as is one type of sex- 
chromosome monosomy (Table 13.2). A wide variety 
of other chromosome abnormalities occur in newborn 
infants as well. Each of the aneuploidy conditions iden- 
tified in Table 13.2, along with the other chromosome 
abnormalities that occur, result in significant phenotypic 
abnormalities in newborn infants. 

Human biologists know that trisomies and monoso- 
mies other than those listed in the table occur at concep- 
tion, but the resulting zygotes almost never survive to be 
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Table 13.2 


Human Aneuploidies and Frequencies at Birth 


Aneuploidy Syndrome Frequency at Birth Syndrome Characteristics 

Autosomal Aneuploidy 

Trisomy 13 Patau syndrome 1 in 15,000 Mental retardation and developmental 
delay, possible deafness, major organ 
abnormalities, early death 

Trisomy 18 Edward syndrome 1 in 8000 Mental retardation and developmental 
delay, skull and facial abnormalities, 
early death 

Trisomy 21 Down syndrome 1 in 1500 Mental retardation and developmental 
delay, characteristic facial abnormalities, 
short stature, variable life span 

Sex-Chromosome Aneuploidy 

47, XXY Klinefelter syndrome (males) 1 in 1000 Variable secondary sexual characteristics, 
infertility, frequent breast swelling; 
no impact on mental capacity 

47, XYY Jacob syndrome (males) 1 in 1000 Tall stature common; possible reduction 
but not loss of fertility; no impact on 
mental capacity 

47, XXX Triple X syndrome (females) 1 in 1000 Tall stature common; possible reduction 
of fertility; menstrual irregularity; 
no impact on mental capacity 

45,XO Turner syndrome (females) 1 in 5000 No secondary sexual characteristics; 


born alive. The explanation for this situation is that the 
abnormalities of development produced by these other 
trisomies and monosomies are so severe that they almost 
always lead to spontaneous abortion early in pregnancy, 
and sometimes the aneuploidy is so disruptive to early 
zygotic mitotic division that implantation in the uterine 
wall never occurs. 

The best available data on human aneuploidy rates 
and survival come from studies that monitor women for 
hormone changes associated with conception and the 
earliest stages of pregnancy. These studies make two sur- 
prising observations. First, in the first trimester of preg- 
nancy, about half of all human conceptions spontaneously 
abort, and second, more than half of the spontaneously 
terminated human pregnancies carry abnormalities of 
chromosome number or chromosome structure. These 
observations point to a surprisingly high (15% to 25%) 
frequency of meiotic nondisjunction in humans. Other 
errors producing gametes with abnormal chromosomes 
can occur as well. 

To ascertain the biological basis for the high rate of 
meiotic nondisjunction in humans, trisomy 21 (Down 
syndrome)—the most common autosomal trisomy at 
birth—has been the focus of intense study. Epidemiologic 
studies conducted over several decades have linked the 
risk of a child having trisomy 21 to the age of the mother 
at conception. Table 13.3 illustrates the connection be- 
tween maternal age and the risk of trisomy 21. 


infertility, short stature; webbed neck 
common; no impact on mental capacity 


Molecular and genomic analyses of Down syndrome 
have determined that a small number of genes on chromo- 
some 21 are responsible for the mental retardation and heart 
abnormalities that are principal symptoms of the syndrome. 
The critical portion of chromosome 21 for Down syndrome, 
known as the Down syndrome critical region (DSCR), was 
identified by studying people with partial trisomy of chro- 
mosome 21. These individuals carry two complete copies of 
chromosome 21 and a small additional segment of chromo- 
some 21 on another chromosome. These studies identify re- 
gion 21q22.2 as the DSCR. In other words, Down syndrome 
individuals invariably carry 21q22.2 in three copies. Among 
a handful of candidate genes, DYRK, a homolog of a gene in 
mice and Drosophila that produces dosage-sensitive learn- 
ing defects, makes a major contribution to Down syndrome. 
In mice, increased dosage of the DYRK homolog reduces 
brain size. DSCAM is a second gene whose increased dosage 
is linked to Down syndrome. This gene also has homologs 
in mouse and Drosophila, where its protein product partici- 
pates in the formation of the heart and components of the 
developing nervous system. 

A different kind of change in gene dosage is seen in 
humans with Turner syndrome, a monosomy of the X 
chromosome in which there is one X chromosome but 
no second sex chromosome (see Table 13.2). Despite the 
occurrence of random X-inactivation in human female 
embryos that leads to one expressed X chromosome and 
one inactive X chromosome in each nucleus, two sex 
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Table 13.3 


Maternal Age Range Total Live Births Studied 


Risk of Down Syndrome (Trisomy 21) by Maternal Age” 


Trisomy 21 Births Rate per 1000 Births 


15519 30,272 18 0.49 
20-24 1117593 87 073 
2529 108,746 96 0.90 
30-34 49,487 72 1.56 
35739 19/522 73 4.19 
40-44 4880 73, 18.02 
45-49 304 19 55.02 


^ Data adapted from E. B. Hook and A. Lindsjo, Down syndrome in live births by single year maternal age interval in a Swedish study: Comparison with results from a New York 


State study. Am. J. Hum. Genet. 30 (1978): 19-27. 


chromosomes are necessary for normal early development. 
In female embryos that are XO (Turner syndrome), the 
single copy of the gene SHOX, located in pseudoautosomal 
region 2 on the short arm of the X chromosome and the 
Y chromosome, is insufficient to direct certain aspects of 
normal development. The haploinsufficiency of SHOX ap- 
pears to play a central role in producing Turner syndrome. 


Reduced Fertility in Aneuploidy 


The type and extent of developmental abnormalities in an 
aneuploid organism are a consequence of changes in the 
dosage of the genes affected, but aneuploidy also disrupts 
normal patterns of chromosome segregation during meiosis. 
This results in a reduction in the number of normal haploid 
gametes, and it can reduce fertility. 

Two patterns of homologous chromosome synapsis 
are possible among the three chromosomes at metaphase I 
in trisomy (Figure 13.4)—either a trivalent synaptic struc- 
ture or two of the chromosomes form a bivalent synaptic 
structure and the other chromosome is a univalent that 
does not synapse with another chromosome. There is 
no mechanism to divide three chromosomes equally at 
anaphase I. Thus, two chromosomes move to one pole 
and one chromosome moves to the opposite pole during 
anaphase. On completion of meiosis, half of the gametes 
are haploid, having received one copy of the chromosome, 
but the remaining gametes contain two copies of the 
chromosome. These are n + 1 gametes. This effectively 
reduces the number of viable gametes by approximately 
one-half because the gametes with an extra chromosome 
will produce trisomic progeny that are unlikely to survive. 

This circumstance results in a form of semisterility, 
a reduction—but not complete elimination—of fertility. 


Mosaicism 


Our discussion of random X-inactivation of mammalian 
females identified the phenomenon as an example of natu- 
rally occurring mosaicism, in which different cells of the 
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are aneuploid and carry two chromosomes. 


Figure 13.4 Two meiotic patterns of segregation in 
trisomics. (a) Three chromosomes form a trivalent structure 
at synapsis and produce only two normal haploid gametes 
among the four gametes. (b) A bivalent and a univalent 
arrangement of three chromosomes also leads to just two 
normal haploid gametes. 
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organism contain differently functioning X chromosomes 
(see Section 3.6). Mosaicism refers to the condition in which 
an individual is composed of two or more cell types having 
different genetic or chromosomal makeup. Mosaicism can 
also develop as a consequence of mitotic nondisjunction early 
in embryogenesis. Mosaicism derived from early mitotic 
nondisjunction is one of the many kinds of chromosome 
abnormalities that occur in newborn infants. For example, 
25-30% of cases of Turner syndrome, the X-chromosome 
monosomy (XO), occur in females having mosaicism in 
which some cells are 45, XO and others are 46, XX. Some 
individuals with mosaic Turner syndrome carry 47, XXX 
cells as well. This kind of mosaicism is usually derived from 
mitotic nondisjunction in a 46, XX zygote (Figure 13.5). 

In fruit flies, butterflies, and moths, sex-chromosome 
mosaicism produces a particular sexually ambiguous pheno- 
type called a gynandromorphy. Gynandromorph sex mor- 
phology is female (“gyn”) on one half of the body and male 
(“andro”) on the other half. Gynandromorphy develops as a 
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normal mitosis of the X chromosome 
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Turner syndrome mosaic females contain 46, XX and 45, XO cells, and 
they may also have cells with 47, XXX. 


Figure 13.5 Chromosome mosaicism. Mosaicism usually 
begins with a normal diploid zygote. Mitotic nondisjunction 
produces one or more aneuploid cell lines that persist and are 
found in the newborn. 


(a) White-eyed 
miniature winged 
male (XO) 


(b) Red-eyed 
wild-type winged 
female (XX) 


Figure 13.6 Gynandromorphy in Drosophila. White eye and 
miniature wing are X-linked recessive traits present in the hemi- 
zygous (XO) male half of the fly. Heterozygous genotypes for 
both genes are present in the wild-type (XX) female half of the fly. 


consequence of mitotic X chromosome nondisjunction early 
in development. 

In the example of gynandromorphy shown in 
Figure 13.6, a fly at fertilization is a wild-type female 
heterozygous for alleles for white eye (w) and miniature 
wing (m). Both genes are X-linked, and the genotype 
is wt m*/w m. Normal mitotic division retains both X 
chromosomes until mitotic nondisjunction results in 
loss of the X chromosome bearing the wild-type alleles. 
As a consequence of nondisjunction and continued mi- 
tosis, about half the cells of the adult are w* m*/w m, and 
about half are w m/O. Heterozygous cells in the right- 
hand half of the fly lead its structures to develop with 
female appearance and wild-type eye color and wing 
form. The left-hand half of the fly is hemizygous w m/O, 
having lost an X chromosome. These alleles direct devel- 
opment of structures that appear to be male with white 
eye and miniature wing. 


Trisomy Rescue and Uniparental Disomy 


A rare abnormality of chromosome content called unipa- 
rental disomy has been identified in humans. Uniparental 
disomy occurs when both copies of a homologous chro- 
mosome pair originate from a single parent. It was first 
identified in connection with two chromosomal condi- 
tions, Angelman syndrome (OMIM 105830) and Prader- 
Willi syndrome (OMIM 176270), that are usually the 
result of a partial deletion of the 15q11.12 portion of 
chromosome 15. 

Uniparental disomy has two mechanisms of origin. 
The rarer mechanism requires nondisjunction of the same 
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chromosome in both the sperm and egg, with the result 
that one gamete contributes two copies of the chromo- 
some and the other does not contribute a copy of the 
chromosome. The second mechanism is more common. 
It involves nondisjunction in one parent that results in an 
aneuploid gamete contributing two copies of chromosome 
15. The other gamete is normal and contributes a single 
copy of chromosome 15. Gamete union results in trisomy 
15 in the fertilized egg. This is a condition that is invariably 
incompatible with survival. By a process known as trisomy 
rescue, however, some fertilized eggs that are initially tri- 
somic can survive and lead to the formation of a zygote 
that can survive. In trisomy rescue, one of the extra copies 
of chromosome 15 is randomly ejected in one of the first 
mitotic divisions following fertilization. Which of the three 
chromosomes is ejected is apparently random. Thus, one 
result of trisomy rescue can be a cell with one chromosome 
from each parent. Zygotes with this result have normal 
chromosome content. Alternatively, trisomy rescue could 
result in a zygote that retains two copies of chromosome 15 
from the same parent, and this is uniparental disomy. 


13.2 Changes in Euploidy Result in 
Various Kinds of Polyploidy 


Polyploidy is the presence of three or more sets of 
chromosomes in the nucleus of an organism. Polyploidy 
is common, particularly in plant species, and can re- 
sult either from the duplication of euploid chromosome 
sets from a single species or from the combining of 
chromosome sets from different species. Many types of 
polyploidy are possible—triploids (31), tetraploids (4n), 
pentaploids (52), hexaploids (6n), octaploids (8), and 
so on. Polyploids whose karyotype is comprised of chro- 
mosomes derived from a single species are designated 
autopolyploids (auto = “self”), and polyploids with 
chromosome sets from two or more species are called 
allopolyploids (allo = “different”). Terms such as auto- 
tetraploid (4n chromosomes that all derive from a single 
species) and allohexaploid (6n with chromosomes from 
two or more species) are used to describe a polyploid 
organism’s genomic content. 


Autopolyploidy and Allopolyploidy 


Three mechanisms lead to autopolyploidy (Figure 13.7). 
The first two of these mechanisms are forms of sexual 
polyploidization. Events tied to meiosis are the basis 
for these polyploid outcomes. The third mechanism is 
asexual polyploidization, in which events taking place in 
mitosis result in polyploidy. 


1. Multiple fertilizations. Fertilization of an egg 
by more than one haploid pollen grain results in a 
zygote that is triploid (37) or higher. This is gener- 
ally a rare event because most sexually reproducing 
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Figure 13.7 Mechanisms creating triploid and tetraploid 
zygotes in plants. 


plants have elaborate mechanisms to prevent fertil- 
ization of an egg by more than a single pollen grain. 


2. Meiotic nondisjunction. Meiotic nondisjunction 
affecting all of the chromosomes in a nucleus can 
produce a diploid gamete instead of a haploid gamete. 
This is a common mechanism for polyploidization in 
sexually reproducing plants. After such a doubling of 
chromosomes in a gamete, the union of the result- 
ing 2n gamete and a haploid gamete produces a 3u 
zygote. Similarly, the union of two diploid gametes 
produces 4n zygotes. 


3. Mitotic nondisjunction. Mitotic nondisjunction 
in sex stem cells can result in chromosome doubling, 
thus this process is asexual. These cells divide by 
mitosis before entering meiosis (thus the process 
is asexual), and mitotic nondisjunction doubles the 
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number of chromosomes from 27 to 4n. The gametes 
that result from meiotic division of 4n sex stem cells 
are 2n. If a 2n gamete unites with a haploid gamete, 
the resulting progeny are 37, and if two 2 gametes 
unite, the result is a 4 zygote. 


In contrast, the multiple sets of chromosomes that are 
carried by allopolyploids originated in different species. The 
union of a haploid set of chromosomes from species 1 (1) 
and a haploid gamete from species 2 (n2) produces a hybrid 
organism that may have either an even number or odd 
number of chromosomes, since related species may have 
different diploid numbers. For example, a new species of 
salt grass, Spartina anglica, arose along the English coast- 
line in the late 1800s as a result of interspecific allopoly- 
ploidy. S. anglica has 122 chromosomes. It arose through 
the interspecific hybridization of native salt grass, Spartina 
maritima (2n = 60), with a non-native salt grass, Spartina 
alterniflora (2n = 62) (Figure 13.8). Haploid gametes from 
the two parental species fused to produce an interspecific 
hybrid with 61 chromosomes. The genome of the hybrid 
was stabilized, and fertility was generated by chromosome 
nondisjunction that doubled the chromosome number to 
122. With an even number of chromosomes, balanced gam- 
etes could form. This established the new species that grew 
vigorously and spread its range along the English coast. 


Consequences of Polyploidy 


Allopolyploids of plant species frequently occur naturally 
and are also produced by human manipulation. When 
used for commercial purposes, plant polyploidy gener- 
ates three main consequences. First, fruit and flower size 
is increased. The nuclei and cells of polyploid strains are 
larger than those of diploid strains, and many familiar fruit 
and vegetable varieties benefit from this effect. Apples 
(3n= 51), bananas (32 = 33), strawberries (87 = 56), peanuts 
(4n = 40), and potatoes (4n = 48) are just a few examples. 

Increased fruit and flower size in polyploid plants 
comes at the cost of the second effect—fertility. The prob- 
lem is particularly acute for odd-numbered polyploids (31, 
5n, etc.), in which the odd number of chromosomes cannot 
be evenly divided at the first meiotic division. The result 
is an unequal distribution of chromosomes that makes al- 
most all of the resulting gametes nonviable. This reproduc- 
tive disadvantage is turned into commercial advantage in 
cultivated plants with odd-numbered polyploidy. Certain 
“seedless” fruits and vegetables in the produce aisle of your 
local grocery store are odd-numbered polyploids. 

While most animals do not tolerate polyploidy, there 
are some exceptions among certain fishes and amphibians. 
One of these exceptions is the weed-eating fish the grass 
carp (Ctenopharyngodon idella) that is being employed 
to reduce weed growth in more than 50 countries world- 
wide. Triploid grass carp are created by first artificially 
fertilizing carp eggs and then heat-shocking the newly 
fertilized eggs. Heat-shock causes the diploid fertilized 
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Figure 13.8 The production of a new species by allopoly- 
ploidy. Two salt grass species, Spartina maritima (2n = 60) and 
Spartina alterniflora (2n = 62) produced an interspecific hybrid 
(2n = 61) that subsequently doubled its chromosome number by 
nondisjunction to produce the new salt grass species Spartina 
anglica, an allotetraploid (4n = 122). 


eggs to divide unevenly, producing a triploid cell that goes 
on to develop into a fish that is fully viable. The triploid 
grass carp eat weeds vigorously and, in doing so, help 
reduce weed growth in bodies of water without the use of 
herbicides. As a consequence of their triploidy, however, 
the carp are infertile, so they are unable to reproduce and 
don’t invade the habitats into which they are introduced. 
The triploid grass carp must be restocked periodically if 
its continued presence is desired to control weed growth. 
Allopolyploids exhibit a third characteristic of com- 
mercial importance—increase in heterozygosity rela- 
tive to diploids that comes about when inbred lines are 
crossed and is the basis of additional growth vigor. This 
phenomenon is known as hybrid vigor, and it consists 
of more rapid growth, increased production of fruits and 
flowers, and improved resistance to disease among the 
heterozygous (hybrid) progeny of inbred lines. 
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Reduced Recessive Homozygosity 


The pattern of single-gene inheritance in polyploids differs 
from that in diploids with respect to the proportions of 
dominant and recessive phenotypes from certain crosses. 
This difference is tied directly to the additional number of 
gene copies in polyploid genomes. A dominant phenotype 
is produced by any genotype containing one or more cop- 
ies of the dominant allele, and the recessive phenotype is 
produced only by the homozygous recessive genotype. In 
the case of a phenotype decided by a single gene with a 
dominant and a recessive allele, the likelihood of producing 
the recessive phenotype in a tetraploid strain is decreased 
compared to the likelihood of producing it in a diploid. 

Taking an autotetraploid with the genotype AAaa 
as an example, let’s determine the probability that prog- 
eny produced by self-fertilization would have the geno- 
type aaaa. We'll use the designations A;, A», a3, and 
a4 for alleles of the gene. The ratio of dominant to 
recessive alleles is 2:2 in the tetraploid, and six diploid 
gamete genotypes are produced by homologous disjunc- 
tion: AjA9, A7a3, Ajd4, A243, A244, and aza4. Among 
these gametes, only one (one-sixth of the total) con- 
tains two recessive alleles. The probability of union of 
two fully recessive gametes is therefore (1/6)(1/6) = 1/36, 
much less than the 1/4 probability of producing homozy- 
gous recessive offspring from heterozygous diploids with 
the genotype Aa. Genetic Analysis 13.1 guides you through 
an analysis of a genetic cross involving polyploids. 


Polyploidy and Evolution 


The disadvantages in growth and reproduction experienced 
by polyploid organisms can be outweighed by the evolution- 
ary advantages of polyploidy. More than half of all contem- 
porary flowering plant species are derived from ancestors 
that evolved by polyploidy, and many flowering plant ge- 
nuses include species with different numbers of complete 
sets of chromosomes. In the genus Chrysanthemum, for 
example, a diploid species has 27 = 18. The chromosome 
numbers of other Chrysanthemum species differ from one 
another by 18 chromosomes, with closely related species 
having 36, 54, 72, and 90 chromosomes. 

Evolution by polyploidy is a sudden, dramatic event that 
can lead to the development of a new species over a span of 
just one or two generations as we discuss momentarily for 
modern wheat species (Figure 13.9). The change in chromo- 
some number—say, by doubling of chromosomes—can be 
a reproductive isolation mechanism. For example, mating 
between related plants plant A with 18 chromosomes and 
plant B with 36 chromosomes could produce hybrid prog- 
eny with 27 chromosomes. A gamete with 9 chromosomes 
from plant A and 18 chromosomes from plant B would have 
an odd-numbered ploidy, which dramatically reduces fertil- 
ity. Viable progeny are produced by self-fertilization of plant 
A or plant B or by mating of either plant with another having 
an identical chromosome number. 
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Figure 13.9 The evolution of modern wheat (Triticum 
aestivum), spelt wheat (T. spelta), durum pasta wheat (T. turgidum), 
and other modern species from crosses of ancestral species. 


Species that have had a quiescent genetic history can 
experience a sudden burst of evolutionary change through 
the development of polyploidy by two mechanisms. First, as 
mentioned above, allopolyploidy can result in the evolution 
of a new species, owing to the fact that the newly polyploid 
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progeny are reproductively isolated from their nonpolyploid 
progenitor by chromosomal differences that make hybrid- 
ization between the progenitor and the new species unlikely. 
Second, polyploidy produces gene duplication that relaxes 
natural selection constraints on duplicated copies of genes, 
allowing them to vary and to potentially diversify to generate 
new functions. (We discuss these ideas in Chapter 22.) 

Numerous examples of speciation by polyploidiza- 
tion have been documented in plants, but perhaps no 
common plant species embody the evolutionary impact 
of polyploidy more dramatically than Triticum aestivum, 
common bread wheat, and Triticum spelta, spelt wheat 
allohexaploid that developed through the union of dip- 
loid genomes of three ancestral species in two hybridiza- 
tion events. Modern members of the genus Triticum have 
14, 28, and 42 chromosomes. The evolutionary history 
of modern wheat begins about 12,000 years ago with 
the hybridization of two diploid species that contain 14 
chromosomes each. Einkorn wheat (T. monococcum) is a 
cultivated variety of wheat that can still be found around 
the world and is the modern form of wild einkorn wheat 
(T. urartu). Represented by the chromosome designation 
AA, T. urartu hybridized with a wild grass species, either, 
T. searsii or T. tripsacoides, each with chromosomes repre- 
sented as BB, to form an allotetraploid variety called Emmer 
wheat (T. dicoccoides). Emmer wheat has 28 chromosomes 
and a chromosome formula AABB and was being culti- 
vated approximately 8000 years ago when it underwent 
a second hybridization event with another wild diploid 
grass species, T. tauschii (chromosome formula DD), to 
form T. aestivum and T. spelta (chromosome formula 
AABBDD), the modern allohexaploid species, which each 
have 42 chromosomes. Modern forms of each of the ances- 
tral wheat species are shown in Figure 13.9. 


13.3 Chromosome Breakage 
Causes Mutation by Loss, Gain, and 
Rearrangement of Chromosomes 


We have seen that particularly for animals the proper bal- 
ance of gene dosage is important for promoting normal 
growth and development and that changes in gene dosage 
can have substantial phenotypic consequences. For this 
reason, mutations that result in the loss or gain of whole 
chromosomes or chromosome segments have the poten- 
tial to produce severe abnormalities. In this section, we 
examine changes to chromosome structure that occur by 
chromosome breakage and other events that lead to the 
loss or gain of chromosomal segments. 


Partial Chromosome Deletion 


When a chromosome breaks, both strands of DNA are 
severed at a location called a chromosome break point. 
The broken chromosome ends at a break point retain their 


chromatin structure, and they can adhere to one another, 
to other broken chromosome ends or to the ends of intact 
chromosomes. Any part of a broken chromosome that 
remains acentric (without a centromere), can be lost dur- 
ing cell division. 

Chromosome breakage can result in partial chro- 
mosome deletion, by the loss of a portion of a chromo- 
some. The size of the deletion and the specific genes 
deleted are significant factors in the degree of ensuing 
phenotypic abnormality. Larger chromosome deletions 
are detected by microscopy through the observation of 
altered chromosome banding patterns. In these larger 
deletions, many genes are affected, and the likelihood 
of substantial phenotypic consequences is very high. A 
chromosome break that detaches one arm of a chromo- 
some leads to a terminal deletion (Figure 13.10a). The 
chromosome fragment broken off in terminal deletion 
contains one of the chromosome ends, or termini, con- 
sisting of a telomere and additional genetic material. 
Without a centromere, the acentric fragment lacks a 
kinetochore. It is unable to attach spindle fibers and 
cannot migrate to a pole of the cell during division. 
Acentric chromosome fragments are lost during cell 
division. Organisms carrying one wild-type chromosome 
and a homolog with a terminal deletion are called partial 
deletion heterozygotes. A human condition known as 
cri-du-chat syndrome (OMIM 123450) is an example of 
a chromosome syndrome caused by terminal deletion of 
5p15.2—5p15.3 (Figure 13.10b). The syndrome is named 
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Figure 13.10 Chromosome terminal deletion. (a) A double- 
stranded DNA break at a chromosome break point in region H 
leads to terminal deletion of the acentric fragment. (b) Terminal 
deletion of chromosome 5 in cri-du-chat syndrome. 
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for the distinctive cat-cry-like sound emitted by infants 
with the condition. 

In contrast to a terminal deletion, which results 
from a single break at one end of a chromosome, an 
interstitial deletion is the loss of an internal segment of 
a chromosome that results from two chromosome breaks. 
Interstitial deletions can be seen in many organisms, 
including humans. WAGR syndrome (OMIM 194072) 
and a closely related condition, WAGRO (OMIM 
612469), both result from an interstitial deletion in 
humans affecting chromosome bands 11p1.3 and the ad- 
joining band, 11p2. Studies of chromosome 11 structural 
abnormalities in patients with WAGR syndrome and 
WAGRO syndrome reveal partial chromosome deletions 
of various sizes, with the smallest common deletion 
region at 11p1.3 to 11p2. (Figure 13.11). The initials 
WAGER stand for Wilms tumor (a type of hereditary kid- 
ney cancer), aniridia (the absence of the iris in the eye), 
genitourinary abnormalities, and mental retardation. 
WAGRO has the same four developmental abnormalities 
as WAGR, with the addition of obesity. Patients with the 
largest deletions of 11p12—p13 have all five conditions, 
whereas patients with smaller deletions may have just one 
or two of the disorders. 

WAGR syndrome and WAGRO syndrome result 
from gene dosage imbalance as a consequence of partial 
chromosome deletion. Researchers have identified two 
critical gene deletions in WAGR syndrome and an ad- 
ditional critical gene deletion in WAGRO syndrome. The 
gene PAX6 produces a DNA-binding protein that is a 
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WAGR and WAGRO syndromes. Deletions 5 through 8 result 
in WAGR, but deletions 1 through 4 and 9 do not. The small- 
est common deletion region 11p12-p13 affects bands 11p1.3 
and 11p2. 


transcription-regulating protein in development of the 
eye. The loss of this gene produces aniridia. The gene 
WTI produces a transcription-regulating protein that is 
essential for genitourinary development, and its loss is 
also tied to Wilms tumor and to mental disability. The 
third critical gene deleted in WAGRO syndrome is BDNF, 
which produces a protein expressed in the brain to protect 
striatal neurons from damage and destruction. When this 
gene is deleted, it produces obesity. Other mutant alleles 
of BDNF are associated with anorexia, bulimia, memory 
impairment, and obsessive-compulsive disorder. BDNF 
may play a role in the mental impairment that is part of 
WAGER syndrome. 


Unequal Crossover 


The process of reciprocal recombination achieves the 
recombination of alleles on homologous chromosomes 
without causing a gain or loss of chromosomal material 
that would result in mutation (see Sections 5.2 and 12.6). 
Occasionally, however, crossing over between homologs 
is inaccurate, resulting in chromosome mutations that are 
due to unequal crossover. These mutations result in the 
partial duplication and partial deletion of chromosome 
segments on the resulting recombinant chromosomes. An 
organism carrying one homolog with duplicated material 
is a partial duplication heterozygote, whereas one with 
material deleted from one chromosome is a partial dele- 
tion heterozygote. Both states change the dosage of genes 
carried on the duplicated or deleted chromosome seg- 
ments, and phenotypic abnormalities due to dosage effects 
can occur. 

Unequal crossover is rare and occurs most commonly 
when repetitive regions of homologous chromosomes mis- 
align. The human condition known as Williams-Beuren 
syndrome (WBS; OMIM 194050) is frequently found in 
partial deletion heterozygotes for a segment of chromo- 
some 7. In wild-type chromosome 7, this region con- 
tains duplicate copies of the gene PMS, designated PMS, 
and PMSp, that are located near one another and have 
17 genes located in between (Figure 13.12a). Misalignment 
of the homologous chromosomes results in mispairing of 
PMS, on one chromosome with PMSz on the homolo- 
gous chromosome. A copy of PMS on each chromosome 
is looped out from each homolog during misalignment 
(Figure 13.12b). Unequal crossing over between the mis- 
aligned chromosomes results in one recombinant chromo- 
some that has a partial deletion chromosome 7 that results 
in WBS. This chromosome contains a nonfunctional 
hybrid PMS,4-PMSg gene and is missing intact PMS, 
and PMSz genes as well as the 17 genes normally found 
between PMS, and PMSpg (Figure 13.12c). The partial 
duplication chromosome (containing duplicated copies 
of the hybrid PMS,4-PMSp gene and the 17 interven- 
ing genes) does not cause readily identifiable phenotypic 
abnormalities. 
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Figure 13.12 Unequal crossover in creation of 
Williams-Beuren syndrome. 


Detecting Duplication and Deletion 


Large deletions or duplications of chromosome segments 
can be detected by microscopic examination that reveals 
altered chromosome banding patterns resulting from the 
structural change to the chromosome. Such deletions and 
duplications are generally quite large. In human chro- 
mosomes, duplications and deletions of about 100,000 
to 200,000 base pairs are at the lower limit of chromo- 
some banding visualization. Microdeletions and micro- 
duplications are considerably smaller and are generally 
not easily detected by chromosome banding analysis. 
Instead, molecular techniques such as FISH (fluorescent 
in situ hybridization) can be used to detect the absence or 
duplication of a particular gene or chromosome sequence 
(Figure 13.13; also see Section 11.3). 

Irrespective of the mechanism that produced them, pro- 
phase I homologous chromosome synapsis during meiosis 
produces a telltale signature of partial chromosome dupli- 
cation or deletion. Homologous pairs that are mismatched 


(a) Wild-type chromosome 


$ 
FISH probes A BC 


(b) Microinterstitial deletion 


No fluoresence detected from 
probe B. 


(c) Microduplication 


t 
A BG 
| Two fluorescent spots indicate the 


target of probe B is duplicated. 


Figure 13.13 Detection of chromosome microdeletion 
and microduplication by FISH. (a) Three FISH probes identify 
genes A, B, and C. (b) Microdeletion of a chromosome segment 
containing B prevents probe hybridization. (c) Microduplication 
results in hybridization of probe B to duplicated genes. 


because one contains a large duplication or deletion will 
form an unpaired loop in synapsis (Figure 13.14). Along 
most of the length of the homologous pair, normal synaptic 
pairing occurs. But in regions of structural difference, the 
extra material present on one chromosome bulges out to 
allow synaptic pairing on either side. The material in the 
loop is normal genetic material if one chromosome carries a 
deletion, and it is duplicated genetic material if one homolog 
carries a duplication. 


Deletion Mapping 


Pseudodominance is a genetic phenomenon that 
occurs when a normally recessive allele is “unmasked” 
and expressed in the phenotype because the dominant 
allele on the homologous chromosome has been deleted. 
Pseudodominance is used to map genes in deleted chromo- 
some regions by a method known as deletion mapping. 
We discussed a version of deletion mapping in 
Section 6.7 in connection with Benzer’s fine-structure 
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Figure 13.14 Anunpaired loop at synapsis. The partial 
duplication heterozygote shown here has duplicated genetic 
material of bands 5 through 9. The extra material forms an 
unpaired loop at synapsis to allow homologous regions to 
align correctly. 


GENETIC ANALYSIS 


PROBLEM Flower color in an autotetraploid plant is a single-gene character 
with two alleles, R; and R>, at the gene locus. The R; allele produces color, 
but the R; allele does not. As a consequence, flower-color intensity is deter- 
mined by the number of R; alleles in the genotype. The genotype-phenotype 
correspondence is as follows: 


BREAK IT DOWN: The plants are tetraploids 
(4n), not diploids (2n), thus each genotype contains 
four copies of the R gene, accounting for the variation 
in flower color (p. 438). 


Genotype Phenotype 
RıRıR;R; Dark red 
R,R;R;R3 Light red 
RıRıR2R2 Pink 
RıR2R3R3 Light pink 
R2R2R2R2 White 


r x an BREAK IT DOWN: Chromosome segregation 
A pink-flowered plant is self-fertilized. What are the expected flower-color in meiosis generates multiple combinations of chro- 


phenotypes, and in what proportions are they expected? mosomes in pollen and eggs. Each pollen of egg cell 
————S— ee SS OO CUO contains two copies of the chromosome (p. 439). 


Solution Strategies Solution Steps 


Evaluate 

1. Identify the topic this problem addresses 1. This problem concerns self-fertilization of an autotetraploid. The answer 
and the nature of the required requires determination of the phenotypes of progeny and the expected 
answer. frequency of each phenotype. 


2. Identify the critical information given in O The plant is identified as an autotetraploid, and the specific genotype- 
the problem. phenotype relationships are given. 


TIP: Autotetraploids are 4n and 
carry four homologous chromosomes 
derived from a single species. 


TIP: The gametes of an autotetraploid 
are diploid. Each gamete contains two 
of the chromosomes in the tetraploid 


Deduce (genotype. z J 
3. Identify the genotype of the self-fertilized 3. The genotype of the pink-flowered plant is R;R;R2R2. The gametes will be diploid. 
plant and the possible gametes it Six random combinations of chromosomes can form during gametogenesis. 
produces. The first R; chromosome can occur in a gamete with the second R; or with either 
of the R chromosomes, forming three of the gametes. The second R; can occur 
with either of the R) chromosomes, forming two more gametes, or the two R3 
chromosomes can form a gamete, making the sixth combination. 
4. Determine the genotype and expected 4. Each combination of chromosomes in the gametes will form with equal 
frequency of each possible gamete. frequency, meaning that the expected frequency of each gamete is 1/6. 
A One combination contains both of the R; chromosomes, and one contains 
i Add the predicted frequencies of ) both of the R chromosomes. The remaining gametes are different combina- 
the gametes to be sure their sum is 1.0. tions with the genotype R;R>, for a combined frequency of 4/6. 
Solve 
5. Describe the possible gamete unions 5. The results of union of the three gamete genotypes are as follows: 
and the production of progeny by 
fertilization RR, RiR, RoR, 
i TIP: Use a Punnett (4) (4) (2) 


square to display 
gamete unions. 


R:Rı| — RiR,RiR, R:R,R,R3 R:R,R-R3 


(©) (36) (36) (36) 
RiRo| RiR:RiRp | RiRiRoRo | RiRoRoRo 
(é) (36) (38) (36) 


RR) RjRiRoRy | RiRoRRy | RoRoRoRo 
(36) (36) (36) 
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GENETIC ANALYSIS CONTINUED 


6. Summarize the genotypes, phenotypes, 
and frequencies expected from this cross. 


6. Self-fertilization of a pink plant with the R;R;R2R2 genotype is expected to 
produce the following outcome: 


ce conti Phenotype requncy 

their sum is 1.0. RıR,R,R; Dark red 1/36 
RıR,R;ıR2 Light red 8/36 
R,RıRR3 Pink 18/36 
R RRR Light pink 8/36 
R2R2R2R3 White 1/36 


For more practice, see Problems 1, 2, and 11. 


Visit the Study Area to access study tools. 


analysis of the genes involved in bacterial lysis by bac- 
teriophage. In that analysis, Benzer focused on whether 
it was possible to form a wild-type lysis recombinant 
between a lysis-deficient phage with a point mutation (a 
revertible mutation) and one with a deletion mutation (a 
nonrevertible mutation). In studies using deletion muta- 
tion analysis in diploid organisms, the unmasking of a 
recessive allele (the observation of pseudodominance) is 
central to gene mapping. 

Figure 13.15 shows deletion mapping using pseu- 
dodominance to map the Notch gene (n) in Drosophila. 
The Notch gene resides on the X chromosome, and its 
location is revealed by the detection of pseudodomi- 
nance in fruit flies that are heterozygous for partial 
X-chromosome deletions. Pseudodominance appears in 
females that are heterozygous for the partial deletion, 
carry the recessive allele on the intact X chromosome, 
and have lost the dominant allele from the other, partial 
deletion of the X chromosome. In the figure, the gray 
segments represent chromosome segments present on 


Z w rst n 


| i 
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the partial deletion X chromosomes of six different 
mutants, and color identifies segments that have been 
deleted from that chromosome in each mutant. The 
first two partial deletions (rJ1 and 258-42) do not lead 
to pseudodominance (in other words, the dominant 
wild-type phenotype is observed), indicating that the 
regions deleted do not contain the Notch gene. The 
other two partial deletions, 62d18 and N71a, result in 
pseudodominance (in other words, the recessive phe- 
notype is observed), indicating that the Notch gene 
locus containing the dominant allele is in the region 
3C5 to 3C9. To home in on the location of Notch, pro- 
gressively smaller partial deletions are used to identify 
the smallest deletion segment common to all deletions 
resulting in pseudodominance. In this instance the 
smallest partial deletion common to genomes express- 
ing pseudodominance for Notch is region 3C-7, which 
is missing from mutant 264-39. This is where the gene 
resides. Genetic Analysis 13.2 guides you through analy- 
sis of deletion mapping. 


dm 


Partial 
deletion mutant 


oļjij2| phenotype 


4A 


Dominant 


Dominant 


Pseudodominant 


Pseudodominant 


Pseudodominant 


Pseudodominant 


Figure 13.15 Deletion mapping of the Drosophila Notch (n) gene. The extent of each partial 
deletion of the Drosophila X chromosome is shown by the colored bars for six partial deletion mutants. 
The retention of the dominant character or the emergence of notch by pseudodominance is indicated. 
The smallest X-chromosome segment missing from all pseudodominant mutants is region 3C-7, 


indicating this as the location of the gene. 
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GENETIC ANALYSIS 


PROBLEM In Drosophila, the X-linked recessive mutant traits singed bristle, 
lozenge eye, and cut wing are encoded at linked genes. Five strains of Drosophila 


BREAK IT DOWN: Pseudo- 
dominance can emerge in heterozygous 
organisms when the dominant allele on 
one copy of a chromosome pair is deleted, 
leaving only the recessive allele on the 
unaltered chromosome (p. 442). 


partial deletion of the X chromosome. 


Comparative X-chromosome maps showing the extent of deletions in each 
pseudodominant strain (indicated by dashed lines) are given here along with 
the pseudodominant phenotypes found in each strain. Use this information to 
locate each gene as accurately as possible along the X chromosome. 


BREAK IT DOWN: Gene mapping by 
pseudodominace seeks to identify the smallest 
chromosome that might contain a particular 
gene (p. 444). 


X chromosome 
z 2 4 6 8 101214161820 
duced by the cross of pure-breeding wild-type pore ee 

pro = = . 

t 1 oo 
and pure-breeding mutant flies (SLC/SLC x sIc/sIc) ce ell j 
are expected to have the trihybrid genotype i 

Strain2 gg} ------ 
SLC/slc and express the wild-type phenotypes. inaen, cut = — 
Females of each strain exhibit pseudodominance Strain 3 C 
for one or more of the traits, however, due to lozenge 

Strain4 = ait ----- —S} 

singed, cut 

Strain 5 ~a] 

cut 


Solution Strategies Solution Steps 


Evaluate 


hs 


Identify the topic this problem addresses 
and the nature of the required 
answer. 


. This problem addresses deletion mapping using pseudodominance to locate 


the position of each gene. The answer requires construction of a map of gene 
locations. 


. The deletion regions on chromosomes and the corresponding pseudodomi- 


2. Identify the critical information given in 

the problem. nant phenotypes are given. 
Deduce 
3. Review the meaning of pseudodomi- 


nance and the connection between 
chromosome deletion and 
pseudodominance. 


. Pseudodominance is the appearance of a recessive trait in a presumed het- 


erozygous organism due to deletion of a chromosome segment carrying the 
dominant allele. In deletion mapping using pseudodominance, the location 
of a gene maps to the smallest common deletion region shared by all organ- 
isms expressing the pseudodominant trait. 


Solve 


4. 


Interpret the meaning of the pseudo- 
dominant phenotype in strain 1. 


. Strain 1 is missing chromosome material from the 8th to the 14th map unit. 


The appearance of the pseudodominant phenotype singed indicates that 
the singed gene maps to this interval. 


. Strain 2 has a deletion from map units 4 to 13 that includes both singed 


5. Compare strain 2 to strain 1, and interpret 
the meaning of the new pseudodominant and cut. 
phenotype cut. This narrows the location of singed to the interval between 8 and 13 map 
PEE Compare deletion mutants that units. The cut location is between the 4th and 8th map unit, based on its 
share pseudodominance phenotypes , : Dy 
to see where their deletions overlap. appearance with the deletion of this interval. 
6. Assess pseudodominance of strain 3. . Co-occurrence of the deletion between map units 16 and 20 and the 


(T) Assess strains 4 and 5, and refine the 


/ 


locations of the genes further where 
possible. 


appearance of the pseudodominant lozenge phenotype map the lozenge 
gene to this location. 


. Strain 4 contains a deletion between map units 4 and 12 and confines the 


location of singed to the interval between 8 and 12. This strain provides no 
additional information about the location of cut. 


) The deletion between map units 3 and 6 in strain 5 includes cut and refines 
Share pseudodominance phenotypesitó seg its location to between map units 4 and 6. 


TIP: Again, compare deletion mutants that 
where their deletions overlap. 


Based on the data for pseudodominance in these five strains, cut resides in 
the interval between units 4 and 6, singed lies between 8 and 12, and lozenge 
is between 16 and 20. 


8. Identify gene locations based on the 8. 
deletion mapping analysis. 


For more practice, see Problems 4, 10, and 26. Visit the Study Area to access study tools. 
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13.4 Chromosome Breakage Leads 
to Inversion and Translocation 
of Chromosomes 


Chromosome breakage involves double-strand DNA 
breaks that sever a chromosome. Breakage that is not 
followed by reattachment of the broken segment leads to 
partial chromosome deletion—but what happens if the 
broken chromosome reassembles but the broken seg- 
ment reattaches in the wrong orientation or if the broken 
segment reattaches to a nonhomologous chromosome? 
The answers are that reattachment in the wrong ori- 
entation produces a chromosome inversion, whereas 
attachment to a nonhomologous chromosome results in 
chromosome translocation. We discuss two types of 
chromosome inversion events and two types of chromo- 
some translocation in this section. A repeating theme 
that will emerge from this discussion is that as long as no 
critical genes or regulatory regions are mutated by chro- 
mosome breakage, and as long as dosage-sensitive genes 
are retained in their proper balance, heterozygous carriers 
of chromosome inversion or chromosome translocation 
may experience no phenotypic abnormalities. However, 
complications during meiosis may affect the efficiency of 
chromosome segregation, and fertility may be affected in 
those individuals. 


Chromosome Inversion 


Chromosome inversions occur as a result of chromosome 
breaks followed by reattachment of the free segment in 
the reverse orientation. Two kinds of chromosome inver- 
sion are observed, depending on whether the centromere 
is part of the inverted segment (Figure 13.16). Paracentric 
inversion results from the inversion of a chromosome 
segment on a single arm and does not involve the centro- 
mere, whereas pericentric inversion reorients a chromo- 
some segment that includes the centromere. 

Inversion most commonly affects just one member 
of a homologous pair, and such organisms are either 
paracentric or pericentric inversion heterozygotes in 
which one chromosome has normal structure and the 
homolog contains an inversion. Inversion heterozygotes 
may experience no genetic or phenotypic abnormalities, 
as long as no critical genes or regulatory DNA sequences 
are disrupted by chromosome breaks. In such cases, the 
180-degree reorientation of inverted segments does not 
change the genetic content or gene expression of the 
affected chromosome. 

Chromosome inversion does, however, cause a dif- 
ference in linear order of genes between the homologs; 
thus, to bring the homologs of an inversion heterozygote 
into synaptic alignment during meiosis requires the for- 
mation of an unusual inversion loop at synapsis. Note, 
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segment 


YY V 
— — 
Chromosome Free-segment Paracentric Paracentric 
breakage rotation inversion inversion 
heterozygote 
(b) Pericentric inversion 
Breakage i 
Inverted 
segment 
Breakage 
Chromosome Free-segment Pericentric Pericentric 
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Figure 13.16 Paracentric and pericentric chromosome 
inversion. The letters represent regions of chromosomes, not 
single genes. 


however, that an organism that is homozygous for an 
inversion carries the same order of genes and chromo- 
some regions on both homologs and therefore will expe- 
rience normal chromosome synapsis without the need for 
inversion loop formation. 

In inversion heterozygotes, inversion loop formation 
readily occurs and does not affect subsequent chromo- 
some segregation. Crossing over takes place between the 
homologs, but whereas crossing over that occurs outside 
the region spanned by the inversion loop takes place in 
the normal manner, crossing over inside the region of 
the inversion loop results in duplications and deletions 
among the recombinant chromosomes. 

Figure 13.17 illustrates crossover within the inver- 
sion loop between chromosome regions B and C in a 
paracentric inversion heterozygote. Following crossover, 
one normal-order chromosome (1*ABCDEFGHI 1') and 
one inverted-order chromosome (3*ADCBEFGHI 3’) are 
unchanged by recombination (the dot represents the 
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Crossover in the inversion loop results in two viable gametes and two 
nonviable gametes. 


Figure 13.17 The consequences of crossover in the 
inversion loop in paracentric inversion heterozygotes. 


centromere). The recombinant chromosomes, however, 
are abnormal: One is a dicentric chromosome with two 
centromeres (2*ABCDA*4), and the other is an acentric 
fragment that has no centromere (2' IHGFEDCBEFGHI 4"). 
At anaphase I, when centromeres on homologous chro- 
mosomes normally migrate toward opposite poles, a 
dicentric bridge forms as the dicentric chromosome 
is pulled toward both poles of the cell. Eventually the 
bridge snaps under the tension, at a random break point. 
Both products of the break have a centromere, but both 
are also missing genetic material. In contrast, the acen- 
tric fragment, lacking a centromere, has no mechanism 
by which to migrate to a pole of the cell and will be 
lost during meiosis. The completion of meiosis of this 
paracentric inversion heterozygote results in two via- 
ble gametes, one with the normal-order chromosome 
(1° ABCDEFGHI 1’) and one with the inverted-order chro- 
mosome (3 * ADCBEFGHI 3’), and two nonviable gametes 
with partial deletion chromosomes. 

Crossover in the inversion loop in a _ pericentric 
inversion heterozygote yields two viable gametes and 
two nonviable gametes (Figure 13.18). One viable gamete 
contains thenormal-order chromosome(1 ABCDE * FGHI 1’) 
and one contains the inversion-order chromosome 
(3 ABCHGF*EDI 3’). Crossover also results in two nonvi- 
able gametes, each having a combination of deletions and 
duplications (2 ABCDE*FGHCBA 4 and (4’ IDE*FGHI 2’). 

Three observations about recombination in inversion 
heterozygotes have important genetic implications: 


1. The probability of crossover within the inversion 
loop is linked to the size of the inversion loop. 
Small inversions produce small inversion loops that 
have a low frequency of crossover. On the other 
hand, larger inversions produce loops that span more 
of the chromosome and correlate with a higher prob- 
ability of crossover. 


2. Inversion suppresses the production of 
recombinant chromosomes. The viable 
gametes produced by inversion heterozygotes 
contain either the normal-order chromosome or 
the inversion-order chromosome, but no recom- 
binant chromosomes are viable, due to duplica- 
tions and deletions of chromosome segments. The 
absence of recombinant chromosomes in progeny 
is identified as crossover suppression. In reality, 
crossovers do occur between homologous chro- 
mosomes carried by inversion heterozygotes, but 
because the recombinant chromosomes contain 
duplications and deletions, there is little possibil- 
ity of viability for any progeny formed from the 
gametes that contain them. Geneticists have taken 
advantage of crossover suppression in research to 
mark homologous chromosomes with dominant 
alleles that aid in the interpretation of genetic 
crosses. Experimental Insight 13.1 describes re- 
search by Hermann Muller, who used the so-called 
CIB (“See-el-bee”) chromosome to identify and 
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Crossover in the inversion loop results in two viable gametes and two 
nonviable gametes. 


Figure 13.18 The consequences of crossover in the 
inversion loop in pericentric inversion heterozygotes. 


later investigate lethal X-linked mutations induced 
in Drosophila by X-ray exposure. 


3. Fertility may be altered if an inversion hetero- 
zygote carries a very large inversion. When 
an inversion spans all or nearly all the length of a 
chromosome, any crossover that occurs will produce 
two viable and two nonviable gametes. This means 
that approximately half the gametes will be lost in 
the specific case of an inversion heterozygote who 
carries a very large inversion. No such loss of fertility 
is expected for organisms with small inversions. 


Chromosome Translocation 


Chromosome translocation takes place following chromo- 
some breakage and the reattachment of a broken segment 
to a nonhomologous chromosome. If no critical genes are 
severed or have their regulation disrupted by the break- 
age or translocation events, translocation heterozygotes, 
with one normal chromosome and one altered chromo- 
some in each homologous pair, have a normal outward 
phenotype and a normal pattern of gene expression. Even 
if no phenotypic abnormalities are detected, however, 
certain translocation heterozygotes can experience semis- 
terility as a result of abnormalities of chromosome segre- 
gation, as we describe below. 

Three principal types of translocation are observed. 
Unbalanced translocation arises from a chromosome 
break and subsequent reattachment to a nonhomologous 
chromosome in a one-way event; that is, a piece of one 
chromosome is translocated to a nonhomologous chro- 
mosome and there is no reciprocal event (Figure 13.19a). 
Reciprocal balanced translocation is produced when 
breaks occur on two nonhomologous chromosomes and 
the resulting fragments switch places when they are reat- 
tached (Figure 13.19b). Robertsonian translocation, also 
known as chromosome fusion, involves the fusion of 
two nonhomologous chromosomes (Figure 13.19c). One 
consequence of Robertsonian translocation is the reduc- 
tion of chromosome number. Our discussion in this sec- 
tion focuses on reciprocal balanced translocations and on 
Robertsonian translocations. 


Reciprocal Balanced Translocation In reciprocal balanced 
translocation, one member of each homologous pair is 
altered by translocation, and none of the four chromosomes 
has a fully homologous partner. Instead, the translocated 
chromosome segments homologous to the normal member 
of each pair are dispersed on two other chromosomes. 
The absence of complete homology between chromosome 
pairs requires formation of an unusual tetravalent synaptic 
structure, a cross-like configuration made up of the four 
chromosomes related by the translocation, to enable 
homologous regions to synapse during metaphase I, as 
shown in Figure 13.20. The chromosomes in the figure are 
labeled I, I, III, and IV so that we may more easily follow 
their progress in meiosis and meiotic outcomes. 

Two patterns of chromosome segregation emerge 
from the tetravalent structures found in translocation 
heterozygotes. Alternate segregation and adjacent-1 seg- 
regation each occur in approximately 50% of meiotic 
divisions, although the actual proportions vary some- 
what among different species. At anaphase I in alternate 
segregation, chromosomes I and IV move to one cell 
pole and chromosomes II and III move to the opposite 
pole. At the completion of meiosis, all gametes are viable 
because each contains a complete set of genetic informa- 
tion for the two chromosomes. Fertilization of a gamete 
containing chromosomes I and IV will produce a normal 
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(c) Robertsonian translocation (chromosome fusion) 
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Figure 13.19 Unbalanced, reciprocal balanced, and 
Robertsonian chromosome translocations. 


zygote, whereas fertilization of a gamete containing chro- 
mosomes II and III will produce a zygote with recipro- 
cal balanced translocation heterozygosity, like the parent 
illustrated in the figure. 


Normal 


In anaphase I of adjacent-1 segregation, chromo- 
somes I and III are moved to one cell pole and chromo- 
somes II and IV go to the opposite pole. None of the 
gametes formed by this pattern of segregation is viable 
because of duplications and deletions of genetic informa- 
tion. Gametes containing chromosomes I and III have a 
duplication of the F and G regions, along with deletion 
of the R and S regions. Conversely, gametes containing 
chromosomes II and IV have a duplication of the R and S 
regions and a deletion of regions F and G. 

Occasionally, an unusual pattern of segregation known 
as adjacent-2 segregation takes place. It is rare because it 
requires that chromosomes I and II, which share homolo- 
gous centromeres, move to the same pole of the cell at ana- 
phase I. Correspondingly, chromosomes III and IV, which 
also share homologous centromeres, also move to the same 
cell pole (opposite chromosomes I and II). This is atypical 
of the usual pattern at anaphase I, in which homologous 
chromosomes (that carry homologous centromeres) are 
separated in the reduction division. None of the gametes or 
progeny resulting from adjacent-2 segregation are viable. 

In summary, cell biologists conclude that in balanced 
translocation heterozygotes, only alternate segregation 
produces viable gametes and viable progeny. This pat- 
tern accounts for just one-half of all meiotic events in 
these individuals; thus, the semisterility of translocation 
heterozygotes is due to reduction by about one-half in the 
number of viable gametes that can be produced. 


Robertsonian Translocation In organisms with a 
Robertsonian translocation, also known as chromosome 
fusion, two nonhomologous chromosomes fuse to form 
a single, larger chromosome, resulting in a reduction in 
chromosome number. If two pairs of chromosomes fuse by 
Robertsonian translocation, the number of chromosomes 
in a genome is reduced to 2n — 2. This is a frequently 
observed mechanism by which chromosome number 
evolves in related organisms. This mechanism accounts 
for the difference in chromosome number between human 
(2n = 46) and chimpanzee (2n = 48), as discussed in 
the Case Study. If multiple chromosomes undergo 
Robertsonian translocation, as was the case with mice on 
Madeira, larger reductions in chromosome number occur. 

Carriers of a single Robertsonian translocation have 
one chromosome fusion. The homologs of the fused chro- 
mosomes remain separate chromosomes. Figure 13.21 
illustrates this pattern of Robertsonian translocation in 
humans in a condition called familial Down syndrome 
that is the cause of 5-10% of Down syndrome (trisomy 21) 
cases. Familial Down syndrome occurs when one parent is 
a carrier of a Robertsonian translocation of chromosome 
21 to another autosome, most often chromosome 14. The 
translocation-heterozygous parent has a normal diploid 
genotype produced by a complete copy of chromosome 
14, a complete copy of 21, and a 14/21 fusion chromo- 
some. The fusion chromosome has lost the short arms of 
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Adjacent-2 segregation is very rare 
because it does not separate homologous 
centromeres; gametes are nonviable due 
to duplications and deletions. 


Alternate segregation separates homolo- 
gous centromeres and produces normal 
gametes. 


Adjacent-1 segregation separates 
homologous centromeres and produces 
nonviable gametes with duplications and 
deletions. 


Conclusion: Only alternate segregation produces viable gametes and progeny. This segregation pattern occurs in about half 
of meioses and accounts for semisterility of translocation heterozygotes. 


Figure 13.20 The tetravalent synaptic structure and alternate and adjacent chromosome 
segregation in reciprocal balanced translocation heterozygotes. 


form of Robertsonian translocation heterozygosity leads to 
about a 1 in 3 chance of producing a child with trisomy 21, 
and this high risk is present each time a child is conceived. 


chromosome 14 and chromosome 21, but these contain 
no critical genetic information, and so the Robertsonian 
translocation carriers have a normal phenotype. Three 
possible patterns of segregation of the three chromosomes 
are equally likely following formation of the trivalent com- 
plex. Six possible gametes produced by these patterns 


are shown in the left column of the figure. When united 13.5 Transposable Genetic Elements 


with a normal gamete, three of the six possible gamete 
types result in nonviable zygotes (categories 4, 5, and 6 
in the figure). The other three types of gametes produce 
viable zygotes (categories 1, 2, and 3). Two have normal 
phenotype and one, category 3, has Down syndrome. This 


Move throughout the Genome 


Transposable genetic elements are DNA sequences 
of various lengths and sequence composition that 
have evolved the ability to move within the genome 
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translocation. For reproduction between a 14/21 Robertsonian 
translocation carrier and an individual with a normal karyotype, 
three nonviable zygotes (categories 4, 5, and 6) and three viable 
zygotes (categories 1, 2, and 3) are possible. Approximately 
one-third of the children from such unions (category 3) have 
trisomy 21 (Down syndrome). 


by an enzyme-driven process known as transposition. 
Transposition is a mutational event—one that has a bio- 
logical basis, as opposed to the chemical or physical bases 
of mutagenesis we discussed in Chapter 12. Transposable 
elements exist in dozens of forms that range in size from 
50 bp to more than 10 kb. They vary in copy number from 
a few copies up to hundreds of thousands of copies. 
Transposable elements typically create mutations by 
their insertion into wild-type alleles. The insertion of new 
DNA into a functional gene is the equivalent of inserting 
a random string of letters into a sentence. And just as the 
insertion of a random string of letters renders the sentence 
unintelligible, so too the consequence of DNA transposi- 
tion is to render the wild-type allele nonfunctional by mak- 
ing it unable to produce a wild-type gene product. This 
mutational process is known as insertional inactivation. 
Evolutionarily, transposable elements can increase 
genome size. Many transposable elements seem to have 


the sole function of increasing their own copy number. 
As a consequence, organisms carrying certain transpos- 
able elements derive no useful benefit from their pres- 
ence. Alternatively, some transposable elements contain 
expressed genes that may benefit the organism. In this 
and the following two sections, we discuss transposable 
elements in bacterial and eukaryotic genomes, and their 
evolutionary relationships. 


The Discovery of Transposition 


Barbara McClintock discovered transposition in a series 
of studies of a mutant phenotype of kernel color in maize 
(Zea mays) that took place in the 1930s. The C gene for 
kernel color is located on chromosome 9 in corn. At 
this gene a dominant wild-type allele C produces purple 
kernels and a mutant cı allele produces colorless kernels. 
One gene linked to C produces plump (Sh) or shrunken 
(sh) kernels, and a second linked gene produces shiny 
(Wx) or waxy (wx) kernels (Figure 13.22a). In experiments 
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Figure 13.22 Mutation producing colorless sectors and 
reversion of the unstable colorless mutation in maize by the 
transposable genetic elements Ds and Ac. 
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Hermann Muller and the Drosophila CIB Chromosome Method 


Hermann Muller, a student of Thomas Hunt Morgan, made 
numerous important contributions to genetics. Among 
Muller’s accomplishments were his discovery that X-rays 
induce mutations by chromosome breakage and his develop- 
ment of a genetic method to identify lethal X-ray-induced 
mutations of the X chromosome in Drosophila. 

To identify these mutations, Muller created an X chro- 
mosome called the CIB chromosome (“see-el-bee”): “C” for 
crossover suppression, “I” for presence of a recessive lethal mu- 
tation, and “B” for a dominant mutation producing an abnor- 
mal bar-shaped eye. Crossover suppression results from the 
presence of multiple inversions that prevent the appearance 
of recombinants between inverted and wild-type X chromo- 
somes in females. Bar eye is a dominant mutant phenotype 
that permanently marks the inversion chromosome, since 
it cannot be reshuffled by recombination. Potentially lethal 
recessive mutations (m?) are generated on male X chromo- 
somes by X-ray exposure. 

Drosophila males that are hemizygous for CIB (CIB/Y) die 
as a result of the lethal mutation (/) on the X chromosome. 
Female carriers of CIB (CIB/+) survive and preserve the chro- 
mosome. Muller began his search for lethal X-ray-induced 
mutations by exposing male fruit flies to X-rays to induce 
mutations in germ-line cells. X-ray-exposed males were then 
crossed to a bar-eyed female (CIB/+), in Cross I. Next, bar-eyed 
female progeny from Cross | were individually mated to wild- 
type males, in Cross Il. Cross Il would be expected to produce 
a 2:1 ratio of females to males if X-ray exposure did not induce 
a lethal mutation on the X chromosome. In this case, only 
males inheriting the CIB chromosome would die. If on the 
other hand a lethal mutation was induced, only female prog- 
eny would be produced by Cross II. Males inheriting the CIB 
chromosome would die, but so would males inheriting the X 
chromosome with the induced lethal mutation. 

Identifying X-ray-induced lethal mutations using the CIB 
method is highly accurate: It requires only a determination of 
whether males are produced by Cross Il. Muller recognized 
that when X-ray exposure induced a lethal mutation, he could 
study it by means of the Cross Il females with normal eyes, 
which are heterozygous carriers of the induced lethal muta- 
tion. Muller used the CIB method to demonstrate that X-ray 
exposure induces mutations at a rate more than 150 times 
greater than the spontaneous mutation rate in Drosophila. His 
work led to the characterization of numerous mutations and to 
the identification of the linear relationship between the level of 
X-ray exposure and the frequency of induced lethal mutations. 


MULLER’S CIB METHOD 


X-ray-exposed males are mated to bar-eyed females carrying 
the CIB chromosome in Cross |. Progeny bar-eyed females that 
potentially carry a lethal X-linked mutation [m(?)] are crossed to 
wild-type males in Cross Il. The absence of male progeny from 
Cross ll identifies the occurrence of an induced lethal mutation. 
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with several trihybrid strains of maize with the genotype 
C Sh Wx/c, sh wx, McClintock found a few unusual ker- 
nels that were mostly purple but had colorless sectors that 
varied among different kernels. Invariably, however, the 
purple regions were plump and shiny, but the colorless 
sectors were shrunken and waxy. 

Looking at chromosome 9 in nuclei of cells from the 
colorless sectors of kernels, McClintock noticed a termi- 
nal deletion of one chromosome 9 homolog. In contrast, 
both chromosome 9 homologs were intact in cells from 
purple sectors. McClintock concluded that the simultane- 
ous appearance of colorless, shrunken, and waxy resulted 
from pseudodominance due to deletion of the dominant 
alleles from one homolog (Figure 13.22b). Mitotic division 
of an original cell containing the chromosome deletion 
produced the abnormal sectors. 

The frequency of sectored kernels was too high to 
be a result of spontaneous chromosome mutation, and 
more importantly, McClintock saw that break points of 
chromosome 9 occurred in the same place in all affected 
kernels of a given strain. Based on these observations 
she concluded that a genetic element, later named a dis- 
sociation (Ds) element, was located at the site of chro- 
mosome breakage. What puzzled McClintock, however, 
was why Ds generated chromosome breakage in some 
cells but not in others. To explain this, she suggested 
that Ds alone could not generate chromosome breakage. 
Instead, chromosome breakage at Ds was activated by 
an unlinked genetic element she called an activator (Ac) 
element. 

McClintock’s Ds/Ac proposal proved to be the expla- 
nation for another highly unusual observation she made 
in maize. She found occasional colorless maize mutants 
that had an unstable mutant phenotype. These unstable 
mutants had kernels that were mostly colorless but also had 
purple spots. The patterns of purple spotting differed from 
kernel to kernel on the same maize ear, indicating that it de- 
veloped by some sort of reversion in somatic cells that was 
perpetuated by subsequent mitotic division (Figure 13.22c). 
Her investigation led McClintock to conclude that the un- 
stable mutant alleles were produced by the insertion of Ds 
into the C allele to form the mutant c;”* allele. This allele 
is mutated by the insertional inactivation process and as a 
result it produces no kernel color. The c” allele is reverted 
through the action of Ac that activates the excision of Ds in 
individual somatic cells of developing kernels. The reversion 
of cP" to C in these and descendant cells leads to pigment 
production and purple spots. 

McClintock’s transposable genetic element hypothesis 
was that the unstable mutant phenotype was the result of 
a transposable genetic element (Ds) that created a muta- 
tion when it inserted into C and led to reversion when 
the expression of Ac led to its removal (Figure 13.22d). 
McClintock’s hypothesis came at a time when genes were 
first being described, before DNA was known to be the he- 
reditary material, and before DNA structure was described. 


It was difficult for many biologists to understand how ge- 
netic elements could be mobile, and so the transposition 
hypothesis was much debated for years. Eventually, how- 
ever, more examples of transposition emerged in maize, in 
other plant species, in animals, in archaea, and in bacteria. 
Since McClintock’s discovery of transposition in maize, the 
process has been identified in virtually all organisms. For 
her discovery of transposition, McClintock was awarded 
the 1983 Nobel Prize in Physiology or Medicine. 

McClintock’s observation of the effects of transposi- 
tion were important, but they were not the first example 
of a geneticist examining a mutant caused by transposi- 
tion. In a bit of genetic irony, the first of Gregor Mendel’s 
gene to be identified and sequenced, the gene controlling 
round versus wrinkled seed shape, turns out to have a 
recessive allele (wrinkled) that results from the insertional 
inactivation of the dominant wild-type (round) allele. 
Experimental Insight 13.2 describes the identification and 
analysis of the alleles of the R gene in peas. 


The Characteristics and Classification of 
Transposable Elements 


The acceptance of McClintock’s proposal of the exis- 
tence and movement through the genome of transpos- 
able elements led to their discovery in all organisms. 
Transposable elements have even been found in bacterio- 
phage genomes. There are many different types of trans- 
posable elements, ranging from the simplest, which have 
just the sequences required for transposition, to much 
more complex transposable elements that carry mul- 
tiple genes; and there are several different mechanisms 
by which transposable elements move about the genome. 
Despite these differences, transposable elements have two 
distinctive sequence features in common that make them 
recognizable in genomes. The transposable element itself 
is flanked by terminal inverted repeats, and the inserted 
transposable element is bracketed by flanking direct 
repeats (Figure 13.23). The presence of terminal inverted 
repeats and flanking direct repeats was instrumental in 
permitting Cathie Martin and her colleagues to confirm 
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Figure 13.23 E. coli insertion sequence IS903. The central 
region and terminal inverted repeats constitute the transpos- 
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transposition. 
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Mendel’s Peas Are Shaped by Transposition 


Gregor Mendel left good descriptions, data, and analyses 
of the crosses he used for establishing the law of segrega- 
tion and the law of independent assortment, but he did not 
leave any seeds to give geneticists direct access to the genes 
themselves. Experimental Insight 12.1 identifies three of the 
genes studied by Mendel that have now been identified and 
analyzed. Details of the discovery in 1990 of a fourth gene are 
described here. It is the gene responsible for the round and 
wrinkled seed shapes described by Mendel, now known as 
SBE1, the starch branching enzyme 1 gene. 

The gene was identified and shown to be responsible for 
the seed shape variation Mendel reported by a laboratory 
group led by Cathie Martin (Bhattacharyya et al., 1990). In 
its paper, the group reports western blot, northern blot, and 
Southern blot evidence that the recessive mutant allele, r, is 
altered by the insertion of approximately 800 bp of DNA. The 
insertion is of transposable DNA, and its effect is insertional in- 
activation of the ability to produce a starch branching enzyme 
that is the normal gene product. The researchers also provide 
a physiological explanation for the appearance of wrinkled 
seed shape. 


WESTERN BLOT ANALYSIS 


Prior to the start of this study, considerable evidence already 
suggested that seed shape variation was due to differences 
in starch synthesis. Among candidate enzymes known to be 
important in starch synthesis was SBE1. The researchers used 
RR (pure-breeding round) plants as a source of SBE1 to raise 
an antibody against the enzyme. They used protein gel elec- 
trophoresis and western blot analysis to test for reactivity be- 
tween the anti-SBE1 antibody and proteins extracted from RR 
and rr (pure-breeding wrinkled) plants. The antibody detected 
the enzyme in RR plant protein gels but not in rr plant protein 
gels @. This indicates that RR plants produce SBE1 but that rr 
plants do not. 


Western blot 


NORTHERN BLOT ANALYSIS 


The researchers next derived a molecular probe for the SBE7 
gene and tested mRNA from RR and rr plants in northern blot 
analysis. They found that the molecular probe hybridized with 
a 3300-nucleotide mRNA derived from RR plants and with 
a 4100-nucleotide mRNA from rr plants. They found as well 
that the larger transcript from rr plants was about tenfold less 
abundant than the smaller transcript from RR plants @. These 
results indicate that the transcript of SBE7 in rr plants is longer 
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than in RR plants and that it is produced at just a fraction of 
the percentage present in RR plants. 


(2) RR rr 


4100 nt + 


3300 nt — = 


Northern blot 


SOUTHERN BLOT ANALYSIS 


The SBE1 gene contains several restriction sequences, includ- 
ing two for the restriction enzyme EcoRI. The researchers took 
DNA isolated from RR and rr plants, digested it with EcoRI, 
and performed DNA gel electrophoresis and Southern blot 
analysis with the SBE7 molecular probe. They found that the 
probe hybridized a DNA fragment approximately 3.5 kb in 
length from RR plants and a fragment of about 4.3 kb from 
rr plants ©. This result could indicate either the insertion of 
approximately 800 bp of DNA into the r allele or the presence 
of a mutation that changes an EcoRI restriction sequence and 
alters the size of the restriction fragment (see Section 10.2). 
Analysis of the DNA sequence of the r allele revealed that the 
larger restriction fragment was created by insertion of DNA 
into one of the exons of the SBE7 gene @. This event caused 
insertional inactivation of the r allele of SBE7. Additional ex- 
amination of the DNA insert found it to be very similar to the 
Ac transposable genetic element identified by McClintock. 
The transposable DNA element identified by this work is 
named Ips-r (insertion Pisum sativum-r). 


6 RR rr 


3.5 kb — = 


Southern blot 


WRINKLED SEED DEVELOPMENT 


The physiological explanation of wrinkled seed development 
is tied to the loss of function of SBE1. In mature round peas, 
almost half the dry weight is starch. About 35% of the starch is 
in a simple linear form known as amylose. The remainder is in 
complexly branched forms, most commonly a form known as 
amylopectin. Free molecules of sucrose make up about 5% of 
the dry weight. Amylose is actively converted to amylopectin 
by SBE1 in round seeds. In wrinkled seeds, about 30% of starch 
is amylopectin, and about 70% is amylose. Amylose readily 
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loses molecules of free sucrose, and the sugar accounts for 
more than 10% of the dry weight of wrinkled seeds. 

During early seed development, SBE1 is active in immature 
seeds that will become round, but it is inactive due to muta- 
tion in immature seeds that will become wrinkled. In seeds 
that will be wrinkled, the high percentage of free sucrose 
causes cells to import large amounts of water to dilute the 
excess sugar. The extra water results in larger cells and larger 


the insertion of a transposable element as the mutational 
event creating the r allele (see Experimental Insight 13.1). 

Terminal inverted repeats are part of the sequence 
of a transposable element, but flanking direct sequence 
is not. Flanking direct sequence is generated by DNA 
polymerase activity as part of the insertion event. Three 
features characterize all transposition events, and they 
account for the synthesis of flanking direct repeats at 
sites of transposition (Figure 13.24). First, the new tar- 
get site for insertion of a transposable element has both 
strands of DNA cut in a staggered manner that leaves 
short single-stranded overhangs on each end of the cut. 
Second, the transposable element is inserted into its new 
site as double-stranded sequences that are joined to the 
single-strand ends at the new insertion site. Lastly, DNA 
is replicated at the new sites of insertion to fill the single- 
stranded gaps generated by cleavage. This DNA replica- 
tion produces the direct repeats that flank transposable 
elements. 

Transposable elements fall into two categories. DNA 
transposons (also called Class II transposable elements) 
transpose as DNA sequences. Their transposition pro- 
duces flanking direct repeats at the site of insertion. At 
a minimum, all DNA transposons carry the transposase 
gene that produces the transposase enzyme required 
for the movement of the transposon, but many DNA 
transposons carry other genes in addition. DNA trans- 
posons are found in bacterial, archaeal, and eukaryotic 
genomes. Bacterial transposition is exclusively through 
DNA transposition. 

Some DNA transposons, particularly many found in 
bacteria, are simple transposons. This term indicates that 


immature seeds that stretch the seed membrane. As all pea 
seeds mature, they dehydrate to the same level, and this is 
when wrinkling appears in rr seeds. The over-stretched mem- 
branes of those seeds collapse, much like an over-inflated 
balloon that has lost air, causing the seeds to look wrinkled. 
Membranes of RR and Rr seeds have not been stretched by 
extra water importation. They are resilient, and the seeds 
appear round. 


the transposon has terminal inverted repeats surrounding 
the transposase gene with no other genes present. 
Simple transposons in bacteria are identified as insertion 
sequences. In contrast, composite transposons contain 
two insertion sequences and one or more additional 
genes. Composite transposons are in reality composed of 
two insertion sequences. 

The second category of transposable elements con- 
sists of retrotransposons (also called Class I transposable 
elements), which transpose through an RNA intermedi- 
ate. Retrotransposons are composed of DNA, but they 
are transcribed into RNA before transposition, and the 
RNA transcript is then copied back into DNA by the 
specialized enzyme reverse transcriptase. The reverse- 
transcribed DNA is then inserted into a new location, 
where flanking direct repeats are formed. Some, but not 
all, retrotransposons carry the reverse transcriptase gene, 
an enzyme that copies single-stranded RNA into DNA. 
Retrotransposons carrying the reverse transcriptase gene 
can initiate their own transposition, while those lacking 
the gene must utilize reverse transcriptase synthesized by 
another retrotransposon. Because retrotransposons trans- 
pose through RNA intermediates, they do not encode 
transcriptase. Retrotransposons are common in eukary- 
otes, but they are not found in bacteria. None have yet 
been found in archaeal genomes. 

Retrotransposons always generate new copies of 
themselves for transposition. Thus, as transposition by 
retrotransposons takes place in a genome, the number of ret- 
rotransposons increases. Some DNA transposons also trans- 
pose in this manner and increase their number in a genome. 
This process is known as replicative transposition, and it 
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Figure 13.24 Transposition of an IS element. @ The IS 
element is removed from its original insertion site by trans- 
posase cleavage at the end of each inverted repeat. The new 
target site undergoes double-stranded, staggered cleavage by 
transposase. @ Ligation joins the IS element to the new target 
site at one end of each strand. € Remaining single-stranded 
gaps are filled by DNA replication to create direct repeats that 
flank inserted IS elements. 


can be thought of as a “copy-and-paste” process, whereby 


the original copy of the transposable element remains in 
place and a new copy is transposed to another location. 


Table 13.4 
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Alternatively, some DNA transposons undergo nonrep- 
licative transposition; this can be thought of as a “cut- 
and-paste” mechanism. In this process, the original copy of 
the transposon is excised, and it is then reinserted into a new 
location. Nonreplicative transposition does not increase the 
number of copies of a transposable element in a genome. 


13.6 Transposition Modifies Bacterial 
Genomes 


Bacterial genomes, as well as plasmids and viruses, contain 
two types of transposable elements: (1) simple transpo- 
sons known as insertion sequences (ISs) contain se- 
quences encoding terminal inverted repeats surrounding 
a gene (sometimes two genes) encoding transposase and 
(2) composite transposons, designated Tn in bacteria, that 
contain transposase plus one or more additional genes. 


Insertion Sequences 


Numerous IS elements are found in bacterial, archaeal, 
and viral genomes and also in plasmids (Table 13.4). 
These are simple DNA sequences that contain only 
the genetic information necessary for their own trans- 
position. Ranging between about 800 and 2000 bp, IS 
elements insert by either replicative or nonreplicative 
transposition. All IS elements have terminal inverted 
repeats surrounding the transposase gene. The inverted 
repeats vary in sequence. The length of inverted repeats 
also varies, as Table 13.4 indicates. Transposition of an 
IS element leads to formation of flanking direct repeats. 
Insertion sequences are designated by “IS” followed by 
a distinguishing number. Thus, /S1, [S2, 1S4, and so on, 
identify insertion sequences that differ in total length 
and in the length and sequence of their terminal inverted 
repeats. 

Because IS elements carry only the genetic infor- 
mation needed for their own transposition, they influ- 
ence bacteria only in limited ways. One effect of the 


Characteristics of Insertion Sequence Elements in E. coli 


Inverted Repeat 


Direct Repeat 


Integration 


Element Length (bp) Length (bp) Length (bp) Number in E. coli Target Sequence 
Isl 768 23 9 5-8 Random 
182 1327 41 = 5 Hotspots J 
IS4 1428 18 11 2 AAAN TTT 
Is5 1195 16 4 Variable Hotspots 
IS10R 1329 23 9 Variable NGCTNAGCN 
IS5OR 1531 9 9 Variable Hotspots 
J IS903 1057 18 9 Variable Random 


N indicates any nucleotide. 


transposition of IS elements is to produce mutation. The 
mutations result from insertion of an IS element into a 
gene or into a regulatory sequence. Typically, insertion 
inactivates the function of the gene or sequence. IS ele- 
ments do have another role as well, as we discussed in 
Section 6.1: IS regions are potential sites of recombina- 
tion between bacterial chromosomes and plasmids form- 
ing Hfr chromosomes. In this role, IS elements promote 
recombination that can lead to gene transfer between 
bacteria. 

The transposable elements identified to date in ar- 
chaeal genomes are all of the IS type, and they have 
sequences that show close homology with bacterial IS 
elements. Genetic Analysis 13.3 guides you through an as- 
sessment of potential terminal inverted repeat sequences 
of IS elements. 


Composite Transposons 


Bacterial composite transposons (Tn) are composed of 
two copies of an IS element, each flanked by its terminal 
inverted repeat sequences, and one or more additional 
genes. Tn elements are considerably longer than IS ele- 
ments, ranging up to about 10,000 bp in length (Table 13.5). 
The additional genes in Tn elements are variable and are 
contained in a central region that is flanked by the two IS 
elements (Figure 13.25a). The genes in the central region 
confer characteristics such as antibiotic resistance and re- 
sistance to the toxic consequences of heavy metal exposure. 
These transposable elements can thus carry genes that may 
confer a growth advantage in certain environments. 

Tn10 has a structure typical of most composite trans- 
posons (Figure 13.25b). It contains two copies of the S10 
element, each with its terminal inverted repeats. These 
are designated /S10R on the right (R) side and JSJOL on 
the left (L) side, and they flank the central region Each of 
the IS elements is about 1300 bp in length, and the Tn10 


Table 13.5 Characteristics of Bacterial Composite 
Transposons 
Sequence 
Difference 
Insertion betweenIS Transposon Marker 
Transposon Sequences Elements Length (bp) Gene“ 
Tn5 IS5SOL  1-bp 
difference 5700 Kan? 
IS50R 
Tn9 IS1 None 2500 Cam? 
Tn10 IS10L 2.5% 
difference 9300 Tet? 
IS10R 
Tn903 15903 None 3100 Kan? 


^ Cam = chloramphenicol, Kan = kanamycin, Tet = tetracycline. 
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(a) General structure 


IS unit (left) Central region IS unit (right) 
l l 
Transposase Transposase 
gene Marker gene gene 
m Do oi] au 
=i ——— =a 
= _—= = 
IS element IS element 
inverted inverted 
repeats repeats 
(b) Structure of Tn10 
1329 bp ~6600 bp 1329 bp 
Transposase Transposase 
gene gene 
l l 
=| lc 
L l | | 
| 
ISTOL Tetracycline IS10R 
Inverted resistance Inverted 
repeats of gene (Tet’) repeats of 
IS element IS element 
Ld Ld 


Inverted IS elements 


Figure 13.25 Structure of a composite transposon. 


central region is about 6600 bp in length. It contains a 
Tet® gene for resistance to the antibiotic tetracycline. The 
total length of 7710 is about 9300 bp. The 7710 trans- 
poson readily inserts into plasmid DNA, allowing rapid 
dissemination of tetracycline resistance among bacterial 
strains that carry the plasmid. 

Bacteria can also carry a third type of DNA transpo- 
son known as a noncomposite transposon. These trans- 
posons do not contain insertion sequences but do carry 
additional genes. They transpose in the same manner as 
composite transposons. The noncomposite transposon 
Tn3, for example, carries two 38-bp inverted repeats 
flanking a 4957-bp central region that encodes three 
genes: transposase and resolvase, both of which are re- 
quired for transposition, and B-lactamase, which provides 
resistance to the antibiotic ampicillin. 


13.7 Transposition Modifies 
Eukaryotic Genomes 


Transposable genetic elements are plentiful and highly 
varied in eukaryotic genomes. Eukaryotic genome se- 
quence analysis finds that substantial proportions of 
many genomes are composed of transposable DNA. For 
example, nearly half of the human genome is composed 
of transposable DNA. Much of this DNA is repetitive 
in sequence, indicating that tens to thousands of cop- 
ies of various transposable elements are present. Many 


GENETIC ANALYSIS 


PROBLEM The following DNA sequences occur on the same strand of DNA and are separated by a 
large number of nucleotides. Which of these sequences might be found flanking an insertion sequence? 


Explain your answer, and identify the relevant parts of your selected sequences. 


a. 5'-TTAGCAC ... CAGGATT-3' 

b. 5'-GGCCAAT ... ATTGGCC-3’ 

Cc. 5'-CCGACCGTA ... CCGACCGTA-3’ 
d 


. 5'-AGTATACCGC ... GCGGTATGGC-3' 


BREAK IT DOWN: Inverted 
repeat sequences are characteristi- 
cally found at the ends of insertion 
sequences (p. 456). 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem addresses T 
and the nature of the required answer. 


This problem requires you to recognize DNA sequences that might flank a 
bacterial insertion sequence. You must identify one or more of the given 


sequences as a candidate flanking sequence. 
2. Identify the critical information given inthe 2. We are given single-stranded sequences from the same strand of DNA on 


problem. opposite sides of potential insertion sequences. 
Deduce 
3. Determine the double-stranded 3. The double-stranded sequences are 
sequences for each of the single- a. 5'-TTAGCAC ... CAGGATT-3’ 
stranded sequences listed. 
3'-AATCGTG ... GTCCTAA-5' 
b. 5'-GGCCAAT ... ATTGGCC-3’ 
3'-CCGGTTA ... TAACCGG-5' 


c. 5'-CCGACCGTA ... 
3'-GGCTGGCAT ... 
d. 5'-AGTATACCGC ... 
3'-TCATATGGCG ... 


CCGACCGTA- 3’ 

GGCTGGCAT-5' 
GCGGTATGGC- 3’ 
CGCCATACCG-5’ 


4. Review what you know about the 4. The sequences flanking insertion elements are inverted repeat sequences. 


sequences flanking insertion elements. 


Solve 


5. Identify any sequence that might be found 5. Sequences b and d in step 3 are the ones most likely to be found flanking 


flanking an insertion sequence. 


insertion sequences, because each sequence forms an inverted repeat 


sequence in double-stranded DNA. 


For more practice, see Problems 30 and 31. 


Visit the Study Area to access study tools. 


eukaryotic genomes follow a similar profile, and it seems 
clear that transposition has been a major factor in eukary- 
otic genome evolution. It is equally evident that transpo- 
sition continues to play an active role in the evolution of 
genomes and in mutation. We discuss some of this activ- 
ity later in this section. 

The replicative and nonreplicative mechanisms that 
accomplish transposition in eukaryotes are the same as 
those described earlier for bacteria. DNA transposons in 
eukaryotic genomes are of multiple types. The Ac/Ds ele- 
ments described by McClintock are DNA transposons, for 
example. A prominent Drosophila transposable element 
known as a P element is also a DNA transposon. More 
commonly, however, eukaryotic transposable elements 
are retrotransposons, including the human genome. We 
begin our examination of transposition in eukaryotic 
genomes with a look at Drosophila P elements and then 
discuss additional eukaryotic transposable elements. 
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Drosophila P Elements 


The genome of Drosophila melanogaster carries several 
dozen copies of a transposable genetic element called a 
P element. These DNA transposons were not part of the 
genome of D. melanogaster collected from the wild before 
about 1960. Today, however, all D. melanogaster collected 
in the wild carry P elements in their genome, suggesting 
that P elements were introduced into D. melanogaster 
about 1960, perhaps by cross-species transfer from a dis- 
tantly related species. Since their introduction to the genome, 
P elements have quickly proliferated. The Drosophila life 
cycle can produce 20 to 25 generations per year; thus, P ele- 
ments have been evolving for about 1000 generations or so in 
D. melanogaster since first being introduced into the genome. 

The P elements exist in multiple forms. Full-length P 
elements encode transposase and are capable of autono- 
mous transposition. These P elements are approximately 


2900 bp in length, and they have a central region con- 
taining a gene for transposase that is encoded in four 
exons and three introns flanked by 31-bp inverted repeats. 
Transcription and translation of the transposase gene in 
full-length P elements produces an 87-kD transposase 
enzyme that activates P element transposition in germ-line 
cells. Several types of nonfunctional P elements are also 
found in the D. melanogaster genome, none producing 
functional transposase and all being shorter than 2900 bp. 

The P elements were discovere D. melanogaster by 
Margaret Kidwell in 1985 when she identified hybrid 
dysgenesis, a phenomenon in which sterility occurs in 
the F, progeny of a cross between laboratory-bred female 
flies and males derived from natural populations. In these 
crosses, the female laboratory fly has the so-called M cyto- 
type (M is for “maternal”), and the wild-type male fly has 
the P (“paternal”) cytotype. The P-cytotype male has three 
to four dozen P elements scattered throughout its genome. 
In contrast, the M-cytotype female has no P elements. The 
progeny of this cross between laboratory and wild flies 
are hybrids that have a normal external appearance, but 
they are dysgenic—in other words they are biologically 
deficient. The term hybrid dysgenesis refers to the combi- 
nation of sterility, a high mutation rate, and a propensity 
chromosomal aberrations and nondisjunction present in 
these flies. Importantly, the mutations found in dysgenic 
flies are unstable, reverting to wild-type or mutating again 
at a high rate. Curiously, the reciprocal cross—a P-cytotype 
female (this genome contains P elements) crossed to an 
M-cytotype male (this genome is P element-free) results 
in normal flies that show no evidence of hybrid dysgenesis. 


(a) (b) 
P cytotype M cytotype M cytotype 
(Chromosomes (Chromosomes (Chromosomes 
carry P-elements) lack P-elements 
and transposition- 
repressing protein) 


Parental 
generation f 


P cytotype 
(Chromosomes 
lack P-elements) possess P-elements 
and transposition- 
repressing protein) 


13.7 Transposition Modifies Eukaryotic Genomes 459 


The current model for hybrid dysgenesis explains 
why the phenotype occurs only when males have the P 
cytotype and females the M cytotype and not in the re- 
ciprocal cross (Figure 13.26). The key appears to be that 
the transposase genes in P elements are silenced by a sup- 
pressor protein in P-cytotype strains. This inhibits their 
transposition and potential for causing mutations. In mat- 
ings of P-cytotype males and M-cytotype females, sperm 
from P-cytotype males contains chromosomes only and 
virtually no cytoplasmic material. The chromosomes 
carry P-elements, but as there is no cytoplasmic material, 
sperm, do not possess the transposition repressor pro- 
tein. The eggs of M-cytotype females contain abundant 
cytoplasmic material but carry no transposition repressor 
protein because the chromosomes in the M cytotype are 
free of P elements. At fertilization, sperm add P element- 
laden chromosomes into an egg lacking transposition- 
repressing protein. Extensive transposition takes place, 
creating multiple mutations by insertion of P elements 
into functional genes or by inducing chromosome breaks 
similar to those observed by McClintock in the maize 
genome. Following embryonic development, the conse- 
quence of this widespread transpositional activity is wide- 
spread mutation by insertional inactivation that results 
in hybrid dysgenesis. In contrast, hybrid dysgenesis does 
not occur in the reciprocal cross between females with the 
P cytotype and males of either the M cytotype. In these 
crosses, the chromosomes derived from the P-cytotype 
female carry P-elements and the cytoplasm of eggs con- 
tains the transposition-repressing protein. This, blocks 
P element transposition. The F; receives chromosomes 


Figure 13.26 Hybrid dysgenesis in 
Drosophila. (a) Male Drosophila of the 

P cytotype crossed to females of the M cytotype 
produce F, progeny that are largely infertile 
due to mutations resulting from P element 
transposition. (b) Crosses of P-cytotype females 
to males with either the P or the M cytotype 
yield F, progeny of normal fertility. 


F, hybrids 
Sterile wild type Normal fertility 
Few or no progeny due 
F progeny | to R sterility caused by 


hybrid dysgenesis. p. { 


N 


Wild-type offspring 
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that are free of P-elements from the M cytotype male, and 
the germ line of the F; hybrid progeny remains stable. 

The genomes of laboratory strains of fruit flies 
(M cytotype) are free of P-elements, whereas the genomes 
of natural populations of flies (P cytotype) contain scores to 
hundreds of P-elements. Yet the laboratory strains used 
today derive from natural populations collected by Thomas 
Hunt Morgan and others beginning in the early 1900s. 
Why are laboratory strains and natural flies so different? 
The answer appears to be the introduction and rapid 
evolution of P-elements in natural populations after the 
capture of the ancestors of today’s laboratory strains. The 
origin of P-elements and the mechanism of their spread 
through the natural fruit fly genome are not yet clear, but it 
is known that transposable elements, once introduced into 
a population, can spread rapidly. 


Retrotransposons 


Retrotransposons are the most common transposable 
elements in eukaryotic genomes. They are related to 
RNA-containing retroviruses that reverse transcribe their 
genetic information into DNA in order to parasitize host 
cells. In a similar manner, retrotransposons use reverse 
transcriptase to synthesize a DNA copy of the retrotrans- 
poson transcript for insertion into new genome locations. 

Retroviruses generally encode at least three genes, 
called gag, env, and pol. Gag and env encode proteins 
that form the retroviral particle. New retroviral particles 
are produced within infected cells and perpetuate the 
infection by invading new cells. The pol gene encodes the 
enzyme reverse transcriptase that directs the synthesis of 
double-stranded DNA from single-stranded RNA. 

Figure 13.27 illustrates comparative structures of 
a retrovirus and three retrotransposons. Two constant 
features of retrotransposons are seen. First, all retrotrans- 
posons encode reverse transcriptase (pol) to catalyze 
transposition, and some contain gag, but none contains 
env. Second, the gene or genes carried by retrotranspo- 
sons are flanked by long terminal repeats (LTRs) that 
may be up to several hundred base pairs in length. 


Ty Elements of Yeast Many different forms of Ty ret- 
rotransposons of yeast are found, all sharing the common 
features of retrotransposons. In Ty elements, the central 
element is approximately 6 kb, flanked by LTRs that are 
each about 330 bp in length. Both LTRs contain promoters 
that direct the transcription of different genes in the central 
region. Approximately 50 to 100 copies of Ty elements are 
present in the typical Saccharomyces cerevisiae genome. 
The Ty elements cause mutation in yeast genes by insertion. 


Copia Elements of Drosophila Multiple forms of the 
retrotransposon copia are found in the Drosophila 
genome. Copia elements have a central element of 5 to 
8.5 kb that contains pol and gag genes and is flanked by 
LTRs of 250 to 600 bp each. The word copia comes from 


(a) Retrovirus 


10,000-20,000 bp 
| 


ee o n 
(b) Retrotransposons 
5000 bp 
| 
| copia (Drosophila) | 
LTR LTR 
5900 bp 
| 
| Ty (yeast) | 
LTR LTR 


6500-8000 bp 


L1 (human) 
LTR LTR 


Figure 13.27 Retrovirus structure and selected eukaryotic 
retrotransposons. 


the Latin for “abundance,” and befitting this designation, 
more than 5% of the Drosophila genome is composed of 
copia retrotransposons. This abundance leads to many 
mutations throughout the genome that are usually the 
result of insertion of copia into a wild-type gene. 


LINE and SINE Elements of Humans More than 45% of 
the human genome is composed of transposable DNA. 
Among the functional transposable genetic elements in 
the human genome, LINE (long interspersed nuclear 
elements) and SINE (short interspersed nuclear elements) 
families of elements stand out because of their relative 
abundance and their ability to cause spontaneous human 
gene mutations. LINEs are up to several thousand base 
pairs in length and have an average length of about 
900 bp. SINEs are much shorter and have their sequences 
truncated at one end of the element, likely because the 
reverse transcription process used for their transfer 
terminates before the entire sequence has transposed. 
Almost 1 million copies of LINE sequences are found 
in the human genome. Collectively, these sequences 
constitute a little more than 20% of the total genome 
sequence. Human L1 elements are the most common 
members of the LINE family of elements in the human 
genome. The L1 elements vary in length from about 
6500 bp to 8000 bp. Full-length LI elements encode a 
protein with nuclease and reverse transcriptase function 
and may also encode a second RNA-binding protein, but 
shortening of the element affects its ability to transpose. 
The human genome contains approximately 600,000 cop- 
ies of L1 alone, constituting more than 17% of the total 
genome. L1 elements actively transpose in the human 
genome and produce mutations. For example, mutations 
of the F8 gene, an X-linked gene whose mutation causes 


an X-linked recessive version of the blood-clotting disor- 
der hemophilia A, are traced to L1 insertion into the gene. 

SINE elements, too, are common in the human genome. 
Just over 10% of human genome sequence is composed of 
SINEs. The Alu element is the most common of the human 
SINE sequences. Alu elements vary in length from 100 to 
300 bp and are each flanked by direct repeats of 7 to 20 bp. 
They are so named because each element can be cleaved 
into two segments by the restriction endonuclease Alul 
(Al-LOO-one) that recognizes the 4-bp restriction enzyme 
target sequence 5’-AGCT-3’. The human genome contains 
more than 1 million Alu elements, and they actively gener- 
ate mutations. A comprehensive review of the role of Alu 


CASE STUDY 


Human Chromosome Evolution 


Researchers can trace the evolution of human chromosomes by 
comparing chromosome structure and genetic composition of 
humans to those of other species that share a common ances- 
tor. We describe two such comparative approaches here: One 
compares syntenic clusters of genes (genes on the same chro- 
mosome) in distantly related species, and the second compares 
banding patterns of chromosomes in closely related species. 

Figure 13.28 compares syntenic clusters of genes on 
20 chromosomes (19 autosomes and the X chromosome) in 
the mouse genome and their relation to the same sequences 
on the 23 chromosomes (22 autosomes and the X chromo- 
some) in humans. Published in 2002 by a large research group 
known as the Mouse Genome Sequencing Consortium, this 
study compares 342 syntenic chromosome segments. The 
average size of the syntenic segments is a little less than 
10 million base pairs. Syntenic groups of genes found in 
the human genome are dispersed among several chromo- 
somes in the mouse genome. Interestingly, human chromo- 
somes 17 and 20 each correspond entirely to a portion of 
mouse chromosomes 11 and 2, respectively. In both cases, the 
human chromosome corresponds to a long cluster of contigu- 
ous syntenic groups in the respective mouse chromosome. 
Comparison of X chromosomes of human and mouse reveal 
very strong sequence and genetic similarity. 
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Figure 13.28 Evolutionary conservation of chromosome 
synteny between mouse and human chromosomes. Each of 
23 human chromosomes is uniquely colored and its segments 
superimposed on 20 mouse chromosomes. 
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elements in human genetic disease by Prescott Deininger 
and Mark Batzer in 1999 found numerous examples of 
new gene mutations caused by Alu insertions. The muta- 
tional mechanisms identified are alterations of gene ex- 
pression by Alu insertion into regulatory DNA sequences 
such as promoters, Alu insertions into exons that alter the 
reading frame (frameshift mutations), disruption of normal 
mRNA splicing following Alu insertion into introns, and 
unequal crossover events between homologous chromo- 
somes involving Alu elements. Overall, Alu elements were 
estimated to transpose in about 1 in 200 people and to be 
directly responsible for about 0.3% of all human hereditary 
disease, much of it due to new mutations. 


This comparison leads to two salient evolutionary con- 
clusions. First, mouse and human share similar syntenic clus- 
ters because their common ancestor carried these clusters. 
Human and mouse chromosomes have diverged from those 
of their common ancestor by numerous rearrangements, in- 
cluding chromosome translocation, chromosome fusion, and 
chromosome inversion, that have changed many attributes of 
chromosome structure, but they also retain large segments of 
genes and sequences as syntenic clusters. Second, for X-linked 
genes specifically, the strong syntenic relationship has been 
maintained by natural selection driven by the requirements 
of embryonic development and the necessity to maintain a 
balance in dosage of X-linked genes by random X-inactivation. 

Figure 13.29 illustrates the banding patterns of chromo- 
somes 1, 2, and 3 of human (H), chimpanzee (C), gorilla (G), and 

HC GO H O HCGO 
© 


W 
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Figure 13.29 Human and great ape chromosome evolution. 


Chromosomes 1, 2, and 3 of human (H), chimpanzee (C), gorilla (G), 
and orangutan (O) are compared to determine the events leading 
to different chromosome numbers and structures. 


Robertsonian translocation 
Inversion 


Inversion and addition 
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orangutan (O). These four closely related primate species last 
shared a common ancestor between 30 and 35 million years 
ago. In each of the three chromosomes, strong similarity of 
banding patterns directly reflects the strong genetic similar- 
ity between the species. Structural and numerical differences 
between the chromosomes allow reconstruction of the evolu- 
tionary events that shaped the contemporary chromosomes of 
each species. Taking the events from the perspective of human 
chromosomes, we can reconstruct the evolution for each chro- 
mosome as follows. 
Chromosome 1 is very similar in the four primate species, 
with the exception of a pericentric inversion and the addi- 
tion of a small segment near the centromere of the human 
chromosome (1q1.2 to 1q2.1). 


SUMMARY ( MasteringGenetics™ 


13.1 Nondisjunction Leads to Changes in 
Chromosome Number 


In euploid nuclei, the number of chromosomes is equal to a 
multiple of the haploid number (n), whereas aneuploid nu- 
clei have additional or missing chromosomes. 

Chromosome nondisjunction is the failure of homologous 
chromosomes or sister chromatids to separate and is a com- 
mon cause of aneuploid gametes. 

Aneuploidy alters the phenotype of an organism by changing 
the balance of gene dosage of critical genes. 

Human aneuploidy manifests as trisomy of particular auto- 
somes and as trisomy or monosomy of sex chromosomes. 
Chromosomal mosaics are organisms containing cells with 
two or more genetic or chromosomal constitutions. 


Uniparental disomy occurs when both homologous copies of 
a chromosome originate in a single parent. 


13.2 Changes in Euploidy Result in Various Kinds 
of Polyploidy 


f Polyploids carry three or more haploid sets of chromosomes. 

E Allopolyploids carry chromosome sets from different 
species, whereas autopolyploids have multiple chromosome 
sets from a single species. 

f Polyploidy is common in plant species, where increases in fruit 
and flower size alter fertility and can produce hybrid vigor. 

| Polyploids have a reduced frequency of recessive homozy- 
gosity compared to diploid species. 


13.3 Chromosome Breakage Causes Mutation by 
Loss, Gain, and Rearrangement of Chromosomes 


f Chromosome breakage can result in terminal deletion or 
in interstitial deletion and may alter chromosome banding 
patterns. 

| Heterozygosity for partial deletion or partial duplication 
produces phenotypic abnormalities through disturbances of 
gene dosage balance. 


For activities, animations, and review quizzes, go to the Study Area. 


Chromosome 2 holds the explanation for the difference 
in diploid number between humans (2n = 46) and our 
close relatives (2n = 48). The reduction in human diploid 
number is the result of a Robertsonian translocation 
fusing two small acrocentric chromosomes that belong 
to separate chromosome pairs in chimp, gorilla, and 
orangutan. 


Chromosome 3 shows strong similarity of banding pattern 
in the four species with the exception of the orangutan 
chromosome, which has undergone a pericentric inversion 
that changed the relative arm lengths and altered the posi- 
tion of the centromere in comparison to the other primate 
chromosomes. 


Homologous chromosome synapsis involving a partial 
deletion or partial duplication chromosome produces a 
characteristic unpaired loop. 


Microdeletions and microduplications too small to be seen 
by banding changes are detected by molecular methods. 
The detection of pseudodominance provides important 
positional indicators for deletion mapping of genes. 


13.4 Chromosome Breakage Leads to Inversion 
and Translocation of Chromosomes 


Chromosome breakage can lead to inversion or transloca- 
tion of chromosome segments. 


Chromosome inversion heterozygotes have one chromosome 
with the normal order but have an inversion in the homo- 
log. Homologs in these organisms form an inversion loop at 
synapsis. 

Paracentric inversions have two break points on one arm 
only, and the inversion does not include the centromeric 
region. Pericentric inversions have break points on each arm, 
and the centromeric region is included in the inverted region. 
Chromosome inversion is a crossover-suppression 
mechanism. 


A tetravalent synaptic structure containing chromosomes 
involved in reciprocal translocation leads to two patterns of 
chromosome segregation in meiosis. 

The reduction in the number of viable gametes produced by 
reciprocal balanced translocation heterozygotes results in 
semisterility. 

Robertsonian translocation occurs by the fusion of nonho- 
mologous chromosomes. 


13.5 Transposable Genetic Elements Move 
throughout the Genome 


Transposition is the process that moves transposable genetic 
elements in genomes and was first discovered in maize. 
Transposase is the enzyme responsible for transposition, and 
it is encoded by many transposable genetic elements. 
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E Transposition produces mutations through insertional 
inactivation modifying gene expression and by contributing to 
unequal crossing over between homologous chromosomes. 


E Composite and noncomposite transposons carry trans- 
posase and additional genes, including those for antibiotic 
resistance. 


| DNA transposons encode transposase and perhaps other 
genes and transpose as DNA sequences. 


13.7 Transposition Modifies Eukaryotic 


E Retrotransposons encode reverse transcriptase and perhaps 
Genomes 


other genes and transpose through an RNA intermediate. 


1 Retrotransposons and some DNA transposons transpose by I Drosophila P elements are common, transpose actively, and 


replicative transposition, a “copy-and-paste” mechanism. 


cause hybrid dysgenesis in certain crosses. 


position, a “cut-and-paste” mechanism. 


13.6 Transposition Modifies Bacterial Genomes J 


! Bacterial insertion sequences encode transposase and 
are flanked by inverted repeat sequences unique to each 


insertion sequence. 
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deletion, partial deletion heterozygote, 
terminal) (pp. 440, 441, 442) 

deletion mapping (p. 442) 

dicentric bridge (dicentric chromosome) 
(p. 447) 

dissociation (Ds) element (p. 453) 

DNA transposon (p. 455) 


PROBLEMS 


Chapter Concepts 


E Some DNA transposons transpose by nonreplicative trans- i 


Retrotransposons, including Ty, copia, LINE, SINE, and 


Alu, are common in eukaryotic genomes and produce 


mutations. 


DNA. 


duplication (partial duplication, partial 
duplication heterozygote) (p. 441) 

euploid (p. 431) 

flanking direct sequence repeat (p. 453) 

gene dosage (p. 433) 

gynandromorphy (p. 436) 

hybrid dysgenesis (p. 459) 

hybrid vigor (p. 438) 

insertion sequence (IS) (p. 456) 

inversion loop (p. 446) 

insertional inactivation (p. 451) 

long terminal repeats (LTRs) (p. 460) 

microduplications (p. 442) 

monosomy (p. 432) 

noncomposite transposon (p. 457) 

nonreplicative transposition (p. 456) 

partial chromosome deletion (p. 440) 

P element (p. 458) 

polyploidy (allopolyploidy, 
autopolyploidy) (p. 437) 

pseudodominance (p. 442) 


Almost half the human genome is derived from 
transposable DNA. LINE, SINE, and Alu sequences are 
retrotransposons that predominate in human transposable 


reciprocal translocation (balanced, 
unbalanced) (p. 448) 

replicative transposition (p. 455) 

retrotransposon (p. 455) 

reverse transcriptase (p. 455) 

Robertsonian translocation (chromosome 
fusion) (p. 448) 

semisterility (p. 435) 

sexual polyploidization (p. 437) 

simple transposon (p. 455) 

terminal inverted repeat (p. 453) 

translocation heterozygote (p. 448) 

transposase (p. 455) 

transposition (transposable genetic 
element) (pp. 450, 451) 

trisomy (p. 432) 

trisomy rescue (p. 437) 

unequal crossover (p. 441) 

unpaired loop (p. 442) 

uniparental disomy (p. 436) 

unstable mutant phenotype (p. 453) 


í MasteringG aA Visit for instructor-assigned tutorials and problems. 


For answers to selected even-numbered problems, see Appendix: Answers. 


1. Consider synapsis in prophase I of meiosis for two plant 
species that each carry 36 chromosomes. Species A is dip- 
loid and species B is triploid. What characteristics of ho- 


mologous chromosome synapsis can be used to distinguish 


these two species? 


2. For one set of chromosomes carried by a triploid plant 
species, assume the chromosome pair as one bivalent 


involving chromosomes C1 and C2, and as one univalent 
with chromosome C3. Show the gametes that result from 
this synaptic pattern, and identify the frequency and con- 
tent of the genetically different gametes produced by the 

species. 


If the haploid number for a plant species is 4, how many 
chromosomes are found in a member of the species that 
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has one of the following characteristics? Explain your 
reasoning in each case. 

diploid 

pentaploid 

octaploid 

trisomic 

triploid 

monosomic 

tetraploid 

hexaploid 


sR me ao op 


In the list above, which plants are likely to be infertile or to 
have reduced fertility? 


From the following list, identify the types of 
chromosome changes you expect to show phenotypic 
consequences. 

pericentric inversion 

interstitial deletion 

duplication 

terminal deletion 

trisomy 

reciprocal balanced translocation 

paracentric inversion 

monosomy 

polyploidy 

Mating between a male donkey (2n = 62) anda female 
horse (2n = 64) produces sterile mules. Recently, however, 


rom me Ao op 


Application and Integration 


11. 


12. 


The plants in this problem are the same as those described 
in Genetic Analysis 13.1, where flower color in the autotet- 
raploid is a single-gene character determined by alleles R; 
and R» that have an additive relationship. The genotype- 
phenotype correspondence is as follows: 


Genotype Phenotype 
R1R{RiR, Dark red 
ERRE Light red 
RRR Pik 
R RRR» Light pink 
RoRoRoR> White 


a. Predict the phenotypes and frequencies of progeny pro- 
duced by self-fertilization of a light red plant. 

b. A light pink and a light red plant are crossed. Predict 
the frequencies of phenotypes among the progeny. 


A normal chromosome and its homolog carrying a para- 
centric inversion are given. The dot (*) represents the 
centromere. 

ABC * DEFGHIJK 

abc djihgfek 


Normal 


Inversion 


a. Diagram the alignment of chromosomes during 
prophase I. 

b. Assume a crossover takes place in the region between 
F and G. Identify the gametes that are formed 


10. 


a very rare event occurred—a female mule gave birth to an 

offspring by mating with a horse. 

a. Determine how many chromosomes are in the mule 
karyotype, and explain why mules are generally sterile. 

b. How many chromosomes does the mule—horse off- 
spring carry? 

c. Why is it very unlikely that the offspring will have fully 
horse-like genetic characteristics? 


Studies of hybrid dysgenesis in Drosophila indicate that the 
transposition repressor protein produced by P elements 

is part of a process that limits the number of P elements 
present in a genome. Why is it advantageous to limit the 
number of P elements in a genome? 


What evidence suggests that copia elements of fruit flies and 
Ty elements of yeast are related to RNA-containing viruses? 


What can we conclude about a mutational event that 
renders /S/ unable to transpose? 


In terms of the chromosome content of nuclei, what is 
meant by the term mosaic? 


In Drosophila, an X-linked recessive allele produces yellow 
body color. The cross of a yellow female and a male with 
wild-type body color usually produces wild-type females 
and yellow males. Occasionally however, a yellow female is 
produced. Explain how the unusual female is produced. 


For answers to selected even-numbered problems, see Appendix: Answers. 


13. 


14. 


following this crossover, and indicate which gametes 
are viable. 

c. Assume a crossover takes place in the region 
between A and B. Identify the gametes that are formed 
by this crossover event, and indicate which gametes 
are viable. 


A pair of homologous chromosomes in Drosophila 
has the following content (single letters represent 
genes): 

RNMDHBGKWU 
RNMDHBDHBGKWU 


Chromosome 1 


Chromosome 2 


a. What term best describes this situation? 

b. Diagram the pairing of these homologous 
chromosomes in prophase I. 

c. What term best describes the unusual structure that 
forms during pairing of these chromosomes? 

d. How does the pairing diagrammed in part (b) differ 
from the pairing of chromosomes in an inversion 
heterozygote? 


An animal heterozygous for a reciprocal balanced translo- 
cation has the following chromosomes: 

MN*OPQRST 

MN* OPQRjk1 

edef*ghijkl 

edef* ghiST 


15. 


16. 


17. 


18. 


a. Diagram the pairing of these chromosomes in prophase I. 


b. Identify the gametes produced by alternate segregation. 
Which of these gametes are viable? 


c. Identify the gametes produced by adjacent-1 segregation. 


Which of these gametes are viable? 

d. Identify the gametes produced by adjacent-2 segrega- 
tion. Which of these gametes are viable? 

e. Among the three segregation patterns, which is least 
likely to occur? Why? 


Dr. Ara B. Dopsis has an idea he thinks will be a boon to 
agriculture. He wants to create the “pomato,” a hybrid 
between a tomato (Lycopersicon esculentum) that has 12 
chromosomes and a potato (Solanum tuberosum) that 
has 48 chromosomes. Dr. Dopsis is hoping that his new 
pomato will have tuber growth like a potato and the fruit 
production of a tomato. He joins a haploid gamete from 


each species to form a hybrid and then induces doubling of 


chromosome number. 


a. How many chromosomes will the hybrid have before 
chromosome doubling? 

b. Will this hybrid be infertile? 

c. How many chromosomes will the polyploid have after 
chromosome doubling? 

d. Can Dr. Dopsis be sure the polyploid will have the char- 
acteristics he wants? Why or why not? 


Suppose polymerase chain reaction (PCR) is used to 
amplify a single DNA marker on human chromosome 21. 
Further suppose that a couple who have a child with Down 
syndrome (trisomy 21) is examined for this marker. The 
mother has marker alleles of 310 and 380 bp. Her mate 
has marker alleles of 290 and 340 bp. What PCR bands are 
present in their child with Down syndrome if nondisjunc- 
tion occurred in 

a. maternal meiosis I 

b. maternal meiosis II 

c. paternal meiosis I 

d. paternal meiosis II 


Chromosome IV in Drosophila is a very small chromosome 


and carries a tiny amount of genetic material. Fruit flies 

that are trisomic for chromosome IV have no apparent 

phenotypic abnormalities, and they retain their fertility. 

Among the genes on chromosome IV is one for which 

a recessive allele ey produces the “eyeless” phenotype. 

A male that is trisomic for chromosome IV and has the 

genotype +ey is crossed to a diploid eyeless female with 

the genotype eyey. 

a. Assuming random segregation of chromosomes takes 
place during spermatogenesis and that all sperm are 
viable, what sperm genotypes are expected and in what 
proportions? 

b. Ifthese sperm are united with eggs from the eyeless 
female, what is the expected ratio of eyeless to normal- 
eyed flies among the progeny? 


A healthy couple with a history of three previous spon- 
taneous abortions has just had a child with cri-du-chat 
syndrome, a disorder caused by a terminal deletion of 
chromosome 5. Their physician orders karyotype analysis 
of both parents and of the child. The karyotype results for 
chromosomes 5 and 12 are shown here. 


19. 


20. 
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a. Are the chromosomes in the child consistent with those 
expected in a case of cri-du-chat syndrome? Explain 
your reasoning. 

b. Which parent has an abnormal karyotype? How can 
you tell? What is the nature of the abnormality? 

c. Why does this parent have a normal phenotype? 

d. Diagram the pairing of the abnormal chromosomes. 

e. What segregation pattern occurred to produce the 
gamete involved in fertilization of the child with 
cri-du-chat syndrome? 

f. What is the approximate probability that the next child 
of this couple will have cri-du-chat syndrome? 

g. Do the karyotypes of the parents help explain the 
occurrence of the three previous spontaneous abortions? 
Explain. 

A boy with Down syndrome (trisomy 21) has 46 chromo- 

somes. His parents and his two older sisters have a normal 

phenotype, but each has 45 chromosomes. 

a. Explain how this is possible. 

b. How many chromosomes do you expect to see in karyo- 
types of the parents? 

c. What term best describes this kind of chromosome 
abnormality? 

d. What is the probability the next child of this couple will 
have a normal phenotype and have 46 chromosomes? 
Explain your answer. 


Human chromosome 5 and the corresponding chromo- 
somes from chimpanzee, gorilla, and orangutan are shown 
on the following page. Describe any structural differences 
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you see in the other primate chromosomes in relation to 
the human chromosome, and propose a mechanism to ex- 
plain each difference. 


21. 


22. 


A small population of deer living on an isolated island 

are separated for many generations from a mainland deer 

population. The populations retain the same number of 

chromosomes and are interfertile, but one chromosome 

(shown here) has a different banding pattern. 

a. Describe how the banding pattern of the island popula- 
tion chromosome most likely evolved from the main- 
land chromosome. What term or terms describe the 
difference between these chromosomes? 


Mainland Island 


p2.1 p2 

p1 pi 
Centromere — — Centromere 

ql ql 
q2 q2.1 

q3.1 q2.2 

q3.2 q2.3 

q4.1 q2.4 


b. Draw the synapsis of these homologs during prophase I 
in hybrids produced from the cross of mainland with 
island deer. 

c. Ina mainland-island hybrid deer, recombination takes 
place in band q1 of the homologous chromosomes. 
Draw the gametes that result from this event. 

d. Suppose that 40% of all meioses in mainland-island 
hybrids involve recombination somewhere in the 
chromosome region between q2.1 and p2. What 
proportion of the gametes of hybrid deer are viable? 
What is the cause of the decreased proportion of 
viable gametes in hybrids relative to the parental 
populations? 


In humans that are XX/XO mosaics, the phenotype is 
highly variable, ranging from females who have classic 
Turner syndrome symptoms to females who are essentially 
normal. Likewise, XY/XO mosaics have phenotypes that 
range from Turner syndrome females to essentially normal 
males. How can the wide range of phenotypes be explained 
for these sex-chromosome mosaics? 


23. 


Deletion 


A plant breeder would like to develop a seedless variety of 
cucumber from two existing lines. Line A is a tetraploid 
line, and line B is a diploid line. Describe the breeding 
strategy that will produce a seedless line, and support your 
strategy by describing the results of crosses. 


In Drosophila, seven partial deletions (1 to 7) shown as 
gaps in the following diagram have been mapped on a 
chromosome. This region of the chromosome contains 
genes that express seven recessive mutant phenotypes, 
identified in the following table as a through g. A re- 
searcher wants to determine the location and order of 
genes on the chromosome, so he sets up a series of crosses 
in which flies homozygous for a mutant allele are crossed 
with flies that are homozygous for a partial deletion. The 
progeny are scored to determine whether they have the 
mutant phenotype (“m” in the table) or the wild-type phe- 
notype (“+’” in the table). Use the partial deletion map and 
the table of progeny phenotypes to determine the order of 
genes on the chromosome. 


Chromosome 


1 


NOURA UN 


ee a) 
Mutation 
Deletion a b c d e f g 

1 ats m ats m 

2 m m a 

3 m tom 

i 4 om z m FF m m 

5 = m oF m E m + Eo 

E e 5 m m m m F m F 

7 m 4 ! 2 i 


25. 


26. 


Two experimental varieties of strawberry are produced by 
crossing a hexaploid line that contains 48 chromosomes 
and a tetraploid line that contains 32 chromosomes. 
Experimental variety 1 contains 40 chromosomes, and 
experimental variety 2 contains 56 chromosomes. 

a. Do you expect both experimental lines to be fertile? 
Why or why not? 

b. How many chromosomes from the hexaploid line are 
contributed to experimental variety 1? To experimental 
variety 2? 

c. How many chromosomes from the tetraploid lines are 
contributed to experimental variety 1? To experimental 
variety 2? 

In the tomato, Solanum esculentum, tall (D—) is dominant 

to dwarf (dd) plant height, smooth fruit (P—) is dominant 


to peach fruit (pp), and round fruit shape (O-) is dominant 
to oblate fruit shape (00). These three genes are linked on 


27. 


28. 


chromosome 1 of tomato in the order dwarf-peach-oblate. 
There are 12 map units between dwarf and peach and 17 
map units between peach and oblate. A trihybrid plant 
(DPO/dpo) is test-crossed to a plant that is homozygous 
recessive at the three loci (dpo/dpo). Progeny plants are 
grown with the results shown below. Identify the mecha- 
nism responsible for the resulting data that do not agree 
with the established genetic map. 


Progeny Phenotype Number 
Tall, smooth, round 473 
-Dwa rf, peach, oblate 476 
E Tall, smooth, oblate 7 12 i 
Dwarf, peach, round e 
Tall, peach, oblate F 
Dwarf, smooth, round 13 
~ Tall, peach, round œ 
Dwarf, smooth, oblate 1 
1000 


In Drosophila, the wild-type red eye color is produced by 

the X-linked allele w*. Mutants for eye color often lack 

the ability to deposit pigment in the eye and have white 

eye color. For the purpose of this problem, assume that 

in Southern blot analysis a molecular probe hybridizes to 

a 5.0-kb fragment of DNA from the eye-color locus. The 

probe binds to DNA fragments containing either wild-type 

or mutant sequence. 

a. Ifa male Drosophila has white eye color as a result 
of inactivation of w* by movement of a 3-kb P ele- 
ment into the wild-type allele, diagram the expected 
Southern blot pattern of DNA fragments from wild- 
type males and white-eyed males and females that carry 
the mutant allele. Explain your reasoning. 

b. Several male progeny of a female carrier of the mutant 
allele have red sectors on their eyes. The number and 
size of the sectors vary among the males. Explain the 
origin of these red sectors, and account for the variation 
in number and size. 

c. If Southern blotting is used to compare DNA isolated 
from a white sector and a red sector of the same eye, 
is a difference in DNA fragment size expected? Explain. 


A Drosophila P element 2.5 kb in length is modified by 

adding a 1.0-kb intron sequence to one of its exons. 

A copia element of 6.0 kb is modified by adding the same 

1.0-kb intron to its central region. 

a. A Drosophila genome carrying both transposable 
elements is induced to undergo transposition. What is 
the length of the newly transposed P element? 


29. 


30. 


31. 


32. 
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b. What is the length of the newly transposed copia 
element? 
c. Explain the results for each case of transposition. 


A biologist studying flight mechanisms in insects wants 
to introduce a dominant mutant allele producing over- 
sized wings, called flapper, into the Drosophila genome. 
The biologist chooses a strain of fruit fly homozygous for 
a recessive mutant producing miniature wings. How will 
the biologist design the experiment using a P element to 
deliver the mutant allele to the genome? 


After reading Experimental Insight 13.2 and examining the 
results of western, northern, and Southern blot analysis of 
plants with the genotypes RR and rr, describe the results 
you would expect to see for each of the three kinds of anal- 
ysis for plants with the genotype Rr. Specify the number 

of bands or spots expected for each analysis, and give the 
expected position of each band or spot. 


Two NotI restriction enzymes cleave DNA on opposite 
sides of the Dbm gene in a species of yeast. A molecular 
probe for Dbm detects a DNA restriction fragment of 

8.5 kb in organisms that are wild type at Dbm. Ina strain 
of yeast, a Ty1 transposable genetic element mutates dbr. 
Ty1 is 5.6 kb in length. 


a. In haploid yeast with this dbm mutation, what is the 
length of the restriction fragment detected by the probe 
following NotI digestion? 

b. What DNA-fragment sizes are detected in a diploid 
yeast strain that is heterozygous for wild-type and 
mutant alleles at dbm? 

c. Insertion of Ty1 into dbm causes a loss-of-function 
mutation. Explain why this is the case. 


For the following crosses, determine as accurately as pos- 
sible the genotypes of each parent, the parent in whom 
nondisjunction occurs, and whether nondisjunction takes 
place in the first or second meiotic division. Both color 
blindness and hemophilia, a blood-clotting disorder, are 
X-linked recessive traits. In each case, assume the parents 
have normal karyotypes (see Table 13.2). 


a. A man anda woman who each have wild-type pheno- 
types have a son with Klinefelter syndrome (XXY) who 
has hemophilia. 

b. A man who is color blind and a woman who is wild 
type have a son with Jacob syndrome (XYY) who has 
hemophilia. 

c. Acolor-blind man and a woman who is wild type have a 
daughter with Turner syndrome (XO) who has normal 
color vision and blood clotting. 

d. A man who is color blind and has hemophilia and a 
woman who is wild type have a daughter with triple X 
syndrome (XXX) who has hemophilia and normal 
color vision. 
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Operon 

Transcription from the 
Tryptophan Operon Is 
Repressible and Attenuated 
Bacteria Regulate the 
Transcription of Stress Response 
Genes and Translation and 
Archaea Regulate Transcription 
in a Bacteria-like Manner 
Antiterminators and Repressors 
Control Lambda Phage 
Infection of E. coli 


Regulation of Gene 
Expression in Bacteria 
and Bacteriophage 


Jacques Monod (left), André Lwoff (middle), and Francois Jacob (right) on 
October 14, 1965, following the announcement of the awarding of the 
Nobel Prize in Physiology or Medicine for their work describing the lactose 
(lac) operon in E. coli. 


T a moment to think about the ever-changing envi- 
ronment endured by the billions of Escherichia coli that 
populate your intestinal tract. These bacteria are accustomed 
to a diverse and constantly shifting set of environmental 
factors and nutritional conditions, as well as to competi- 
tion from the many other bacterial species in your gut. In all 
these rapidly changing environmental conditions, bacterial 
survival depends on the ability to deal with whatever condi- 
tions prevail at the moment. Each individual bacterial cell is 
almost entirely self-reliant when it comes to producing the 
proteins necessary to carry out metabolism and to generate 
the compounds it needs to stay alive and to reproduce. 
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What is the best strategy for survival in a rap- 
idly changing environment? Should the organism 
transcribe and translate all its genes at all times, 
or should gene transcription and translation be 
regulated in a closely monitored manner that 
can respond in a matter of minutes to accommo- 
date changes in growth conditions as they arise? 
Answering these kinds of questions was critically 
important to understanding how evolution has shaped 
the processes of gene expression in organisms. On 
one hand, if bacteria transcribed and translated 
all their genes at all times, they could be instantly 
ready for almost any environmental shift that might 
occur. On the other hand, continuously expressing 
all genes would be terribly costly in metabolic terms 
and entail a great deal of unnecessary transcription 
and translation. Biologists in the 1950s and 1960s 
hypothesized that energetic and metabolic expen- 
ditures associated with regulated gene expression 
would be evolutionarily favored over the high cost 
of continuous gene expression. But to demonstrate 
the validity of that hypothesis, examples of regulated 
gene expression had to be identified and studied. 

The first research describing the gene actions 
and molecular mechanism for regulated gene 
expression was by Francois Jacob, Jacques Monod, 
André Lwoff, and others, who showed how the 
lactose (lac) operon system in E. coli was transcrip- 
tionally regulated in response to the presence or 
absence of the milk sugar lactose. This research was 
a milestone in biology that introduced a new way of 
thinking about the expression of genes. It opened 
the door to research on mechanisms that regulate 
gene expression—research that is just as active to- 
day as it has ever been. 

In this chapter, the regulatory systems we dis- 
cuss are principally found in E. coli, the most widely 
used model bacterium. We begin with a general 
introduction to regulated gene expression and in- 
troduce the concept that the interaction between 
DNA-binding regulatory proteins and regulatory 
DNA sequences regulates transcription. Next we 
explore the organization, function, and regulation of 
the E. coli lactose (lac) operon system, whose gene 
transcription is induced (turned on) by the presence 


of the sugar lactose in the growth medium. This 
topic is followed by a discussion of mutational analy- 
sis and molecular explanation of the transcriptional 
control of lac operon genes. We then turn our atten- 
tion to the genetic structure and molecular control 
of transcription of the tryptophan (trp) operon that 
contains the genes needed to synthesize the amino 
acid tryptophan. After moving on to a discussion of 
post-transcriptional regulation of bacterial genes 
and a discussion of regulated gene expression in 
archaeal species, we examine the regulatory process 
that controls infection of bacterial cells by bacterio- 
phage A (lambda). 


14.1 Transcriptional Control of Gene 
Expression Requires DNA-Protein 
Interaction 


Certain bacterial genes—specifically, those whose 
products are needed continuously to perform routine 
tasks—undergo constitutive transcription, a term 
identifying the genes as being transcribed continuously 
with no regulatory control. In contrast, the need for agile 
and calibrated responses to changing environmental con- 
ditions has resulted in the evolution of mechanisms for 
the regulated transcription of many bacterial genes. 

Regulation of the transcription of bacterial genes 
is the predominant mode by which bacteria regulate 
responses to the environment, and it takes place at two 
levels. At both levels, control results from interactions 
between DNA-binding proteins and specific regulatory 
sequences of DNA. The first level of control regulates the 
initiation of transcription, determining whether a particu- 
lar gene or group of genes is transcribed at all. The second 
transcriptional control level determines the amount of 
transcription, regulating either the duration of transcrip- 
tion or the amount of mRNA transcript produced from 
the gene. 

Additionally, post-transcriptional regulatory mecha- 
nisms are important, controlling the level of translation of 
mRNA or the activity of proteins and enzymes. Figure 14.1 
provides an overview of bacterial regulatory mechanisms. 


Negative and Positive Control of Transcription 


Mechanisms of transcription control are described as 
negative or positive. Negative control of transcription 
involves the binding of a repressor protein to a regula- 
tory DNA sequence, with the consequence of preventing 
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Figure 14.1 An overview of gene expression in bacteria. 
Unregulated (constitutive) expression and three patterns of 
regulated gene expression occur in bacteria. 


transcription of a gene or a cluster of genes. On the other 
hand, positive control of transcription involves the bind- 
ing of an activator protein to regulatory DNA, with the 
result of initiating gene transcription. 

Repressor proteins are a broad category of regula- 
tory proteins that exert negative control of transcription. 
In their active form, repressor proteins bind to regula- 
tory DNA sequences, including those called operators, 
as we describe below for the lactose operon. Repressor 
protein binding blocks transcription initiation by RNA 
polymerase. The repressor protein acts by occupying the 
space on regulatory DNA where the polymerase would 
otherwise bind or by preventing formation of the open 
promoter complex necessary for transcription initiation. 
Repressor proteins can be activated or inactivated by in- 
teractions with other compounds. 

Repressor proteins commonly contain two active sites 
through which their functional role is performed. The DNA- 
binding domain is responsible for locating and binding op- 
erator DNA sequence or other target regulatory sequences. 
The allosteric domain binds a molecule or protein and, in 
so doing, causes a change in the conformation of the DNA- 
binding site. The property belonging to some enzymes of 
changing conformation at the active site as a result of bind- 
ing a substance at a different site is known as allostery. 

Allosteric domains operate in two modes. Certain 
repressor proteins undergo inactivation of their DNA- 
binding domain because of allosteric changes brought 
about by an inducer compound binding to the allosteric 
site (Figure 14.2a). If the inducer is removed from the al- 
losteric site, the repressor’s conformation is switched, the 
DNA-binding site is reactivated, and the protein can repress 
transcription. On the other hand, some repressor proteins 


(a) Effect of inducer 


Repressor protein 


Repressor protein 
Inducer 


DNA- 
binding 
domain 


Allosteric domain 
No RNA polymerase 


>transcription 


N 


Transcription 


Promoter Operator Gene 


Binding of repressor 
protein blocks transcription 
by negative regulation. 


Binding of inducer molecule 
to repressor protein allows 
transcription. 


(b) Effect of corepressor 
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Figure 14.2 Mechanisms of negative control 
of transcription. 


require binding of a corepressor molecule at the allosteric 
site to activate the DNA-binding site (Figure 14.2b). In this 
case, transcriptional repression is reversed when the core- 
pressor is removed from the allosteric site. 

Positive control of transcription is accomplished 
by activator proteins that bind to regulatory DNA se- 
quences called activator binding sites. Activator protein 
binding facilitates RNA polymerase binding at promoters 
and helps initiate transcription. Activator proteins have a 
DNA-binding domain that binds the activator binding site 
of DNA. In one mode of action for activator proteins, the 
DNA-binding domain remains inactive until the allosteric 
domain is bound by an allosteric effector compound. 
The induced allosteric change leads to the formation of 
a functional DNA-binding domain, allowing the activa- 
tor protein to bind to DNA (Figure 14.3a). Alternatively, 
certain activator proteins have a functional DNA-binding 
domain that is converted to an inactive conformation by 
binding of an inhibitor compound in the allosteric bind- 
ing domain (Figure 14.3b). 


Regulatory DNA-Binding Proteins 


Most DNA-binding proteins that exert regulatory control 
bind DNA at specific sequences to accomplish their regu- 
latory activity. These interactions occur by association of 
the amino acid side chains of the proteins with the specific 
nucleotide bases and the sugar-phosphate backbone of 
DNA. The proteins make their contact with specific base 
pairs located in the major groove and the minor groove of 
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Figure 14.3 Mechanisms of positive control of transcription. 


the DNA helix using the unique patterns of hydrogen, ni- 
trogen, and oxygen atoms that characterize each base pair. 

To achieve protein-DNA specificity in these interac- 
tions, the protein must simultaneously contact multiple 


Stabilizing helix 


Recognition 
(a) helix 


Inverted repeat 


nucleotides. A common motif in the structures of DNA- 
binding regulatory proteins is the formation of protein 
secondary structures, most commonly a helices, that con- 
tain the amino acids that contact regulatory nucleotides. 
Frequently, two protein segments contact the DNA target 
sequence. The paired DNA-binding regions of a regula- 
tory protein form in two ways. In one type of interaction, 
a single polypeptide folds to form two domains that bind 
specific DNA sequences. In the other type, the regula- 
tory protein consists of two or more polypeptides joined 
to form a multimeric complex of two (dimeric), three 
(trimeric), or four (tetrameric) polypeptides. When identi- 
cal polypeptides join together, the prefix homo- is used. 
A “homodimer” contains two identical polypeptides in 
the functional protein. When different polypeptides join 
together, the complex is identified by the prefix hetero-, as 
in “heterodimer.” 

Extensive studies of transcription-regulating pro- 
teins in bacteria have identified the characteristic struc- 
tural features of DNA-binding regulatory proteins and 
the DNA sequence they bind. Bacterial regulatory DNA 
sequences frequently contain inverted repeats or direct 
repeats. Each polypeptide of a homodimeric regulatory 
protein, or each of the binding regions of a folded poly- 
peptide, interacts with one of the inverted repeat seg- 
ments. By far, the most common structural motif seen in 
these proteins in bacteria is the helix-turn-helix (HTH) 
motif (Figure 14.4). In the HTH motif, two a-helical 
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Figure 14.4 The helix-turn-helix regulatory protein motif. 


(b) Inverted repeat 

(a) DNA-binding proteins forming an HTH motif are usually dimeric. 
5! opin GCGGATAACAATTICCACACA JE Two subunits of an HTH dimer are shown as shaded cylinders. The 
30i eCOM A 5 recognition helices bind to inverted repeat sequence in the major 
groove, and the stabilizing helices bind to the sugar-phosphate 
backbone. (b) Inverted repeat sequences are often targets of DNA- 
binding regulatory proteins, such as HTH proteins. 
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regions in each of two polypeptides in a homodimer inter- 
act with inverted repeat regulatory sequences in DNA. In 
each of the polypeptides, one of the two a-helical regions 
is the recognition helix that fits into the major groove 
of DNA and binds the inverted repeat sequences. The 
second helix is the stabilizing helix. It lies across the ma- 
jor groove and contacts the sugar-phosphate backbone, 
ensuring a strong DNA-protein interaction and properly 
orienting the recognition helix to sit in the major groove. 
Many different DNA-binding regulatory proteins with 
the HTH motif have been identified in bacteria. We will 
see some examples in later sections of this chapter and in 
discussions of regulatory protein motifs in eukaryotes in 
Chapter 15. 


14.2 The lac Operon Is an Inducible 
Operon System under Negative and 
Positive Control 


In comparing the genomes of different forms of life, one 
conclusion is that evolution has operated to restrict the 
total size of bacterial genomes compared to most others 
and to limit the percentage of repetitive (noncoding) DNA 
to less than 15 percent on average. These limitations are 
imposed by various factors, including the dependence of 
bacteria on their abilities to reproduce rapidly and respond 
quickly to environmental changes. Possession of a rela- 
tively small genome and small percentage of noncoding 
DNA speeds the DNA replication process and shortens 
the reproduction time. The need for rapid responsiveness 
to environmental change and for restricted genome size 
dictates another evolutionary adaptation in bacteria: the 
clustering and coordinated transcriptional regulation of 
genes involved in the same metabolic processes. 

Clusters of genes undergoing coordinated transcrip- 
tional regulation by a shared regulatory region are called 
operons. Operons are common in bacterial genomes, 
and the genes that are part of a given operon almost al- 
ways participate in the same metabolic or biosynthesis 
pathway. Besides having a single promoter, shared by 
the operon genes, operons contain additional regulatory 
DNA sequences that interact with promoters to share 
transcriptional control. 

In this discussion, we focus on the lactose (lac) 
operon of E. coli. This operon is responsible for the 
production of three polypeptides that permit E. coli to 
utilize the sugar lactose as a carbon source for growth and 
metabolic energy. In this section, we explain how the lac 
operon works, describe the circumstances under which 
its genes are transcribed, and identify the regulatory 
mechanisms that control operon gene transcription. In 
the following section, we turn our attention to mutational 
and molecular analyses of the /ac operon to understand 
the function of operon genes and to explore the molecular 
interactions that regulate operon gene transcription. 


Lactose Metabolism 


The monosaccharide sugar glucose is the preferred en- 
ergy source of E. coli, just as it is for your cells. Glucose is 
metabolized by the biochemical pathway called glycolysis, 
a sequence of biochemical reactions that oxidizes glucose, 
and closely related compounds, to produce pyruvate and 
ATP (adenosine triphosphate), the compound used uni- 
versally by cells to store and produce energy. This path- 
way occurs in virtually all cells as part of fermentation 
and cellular respiration. Glycolysis is the principal energy- 
producing reaction in your cells, and it is the energy- 
producing reaction in, E. coli, which, like humans and 
other organisms, are capable of metabolizing sugars other 
than glucose as well. Sugars such as galactose, lactose, 
fructose, and arabinose are also metabolized for energy 
production, but glucose is the preferred sugar because it 
can be directly metabolized in glycolysis. The alternative 
sugars require separate metabolism to first produce glu- 
cose or a glucose derivative that can then be processed by 
glycolysis. Thus, E. coli will consume all available glucose 
before a genetic switch is flipped that changes the meta- 
bolic pathway to one that uses an alternative sugar. 

The genetic switch to lactose utilization requires that 
lactose be present in the cell, but the lactose is not used by 
the cell until after glucose has been depleted. Lactose uti- 
lization is controlled by genes and regulatory sequences 
that form the lac operon, which is an inducible operon 
system, meaning that under the specific circumstances 
that lactose is present in the growth medium and glucose 
is absent, transcription of the operon genes is activated, or 
induced. The inducible nature of the /ac operon and other 
inducible operons also means that expression of operon 
genes is limited to the circumstance in which the inducer 
compound is available. Other nutritional requirements 
may have to be met as well for transcription induction to 
occur. 

Lactose is a disaccharide consisting of two mono- 
saccharides, glucose and galactose, that are joined by a 
covalent B-galactoside linkage (Figure 14.5). Bacteria that 
have a lac* phenotype (“lack plus”) are able to grow 
on a medium containing lactose as the only sugar. lac* 
strains accomplish this growth by producing a gated 
channel at the cell membrane that allows lactose to en- 
ter the cell. The channel is formed by the enzyme per- 
mease. On entering the cell, lactose is processed by the 
enzyme f-galactosidase that processes lactose in two 
ways. The principal activity of B-galactosidase is to break 
the B-galactoside linkage to release glucose and galactose. 
Glucose produced by lactose breakdown can immediately 
enter glycolysis. The molecule of galactose can be further 
processed to produce glucose. In addition to produc- 
ing glucose and galactose, B-galactosidase also converts 
some lactose to an isomer called allolactose. Allolactose 
plays a critical role in regulating the transcription of 
lac operon genes by acting as the inducer compound. 
Allolactose that is not used for induction can be cleaved 
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Figure 14.5 Lactose metabolism. @ Lactose enters the 

E. coli cell from the growth medium with the aide of permease. 
@ B-galactosidase converts some lactose to its isomeric form, 
allolactose. € Most of the lactose has its galactoside linkage 
cleaved by B-galactosidase to yield galactose and glucose. 

@ Allolactose acts as the inducer. Excess allolactose is cleaved 
by B-galactosidase. 


by B-galactosidase. Bacteria that are unable to grow on a 
lactose-containing medium are identified as having a lac~ 
phenotype (“lack minus”). These strains are either unable 
to import lactose to the cell, unable to break it down once 
it is in the cell, or both. 


lac Operon Structure 


The Jac operon consists of a multipart regulatory re- 
gion and a structural gene region containing three genes 
(Figure 14.6a). The regulatory region contains three pro- 
tein-binding regulatory sequences. One is the promoter 
that binds RNA polymerase, another is the operator 


(lacO) sequence that binds the lac repressor protein, and 
the third is the CAP binding site. These three regions par- 
tially overlap and are immediately upstream of the start of 
transcription of lac operon genes. 

The three structural genes of the lac operon are iden- 
tified as lacZ, a gene encoding the enzyme f-galactosidase; 
lacY, which encodes the enzyme permease; and lacA, 
which encodes transacetylase. These three genes are tran- 
scribed as a polycistronic mRNA, an mRNA molecule 
that is the transcript of all the genes in the operon. Each 
gene transcript that is part of a polycistronic mRNA con- 
tains a start and a stop codon sequence. The translation of 
a polycistronic mRNA generates a distinct polypeptide for 
each gene. 

The B-galactosidase produced by the lacZ gene is re- 
sponsible for cleaving the B-galactoside linkage of lactose 
to release molecules of glucose and galactose. As men- 
tioned above, the enzyme also converts a small amount 
of lactose into allolactose, which has a chemical structure 
very similar to that of lactose. The permease enzyme 
encoded by /acY functions at the cell membrane to facili- 
tate the entry of lactose into the cell. Transacetylase, the 
product of lacA, is not essential for lactose utilization, 
although in bacteria it protects against potentially damag- 
ing by-products of lactose metabolism. Our discussion 
focuses only on transcription of lacZ and lacY, and on the 
action of B-galactosidase and permease, since transacety- 
lase is not essential for lactose utilization. 

Adjacent to, but not part of the lac operon, is the 
regulatory gene, Jacl (“lack eye”), that produces the lac 
repressor protein. The Jacl gene has its own promoter 
that is not regulated and drives constitutive transcrip- 
tion. The lac repressor protein is a homotetramer that 
has two functional domains. The first is a DNA-binding 
domain that binds the operator regions, and the second 
is an allosteric domain that binds the inducer substance 
allolactose. 

Figure 14.6b shows the DNA sequence composition 
of the lac operon promoter (lacP) and the lac operator 
(lacO), which together only span about 80 base pairs. 
The promoter and the operator sequences are directly 
adjacent, with the position of the operator sequence over- 
lapping the +1 nucleotide that starts transcription. LacP 
contains the —10 and —35 consensus sequence sites that 
are critical for RNA polymerase binding (see Section 8.2). 
LacO, which binds the repressor protein produced by Jaci, 
overlaps /acP near the start of transcription. Notice also 
that the CAP binding site is near the —35 and —10 regions 
of the promoter. We discuss this relationship in the next 
section. 


lac Operon Function 


The Jac operon is transcriptionally silent when no lac- 
tose is available and when glucose is available to the 
cell (Figure 14.7a). In the absence of production of 
B-galactosidase, there is no allolactose in the cell and the 
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Figure 14.6 The lactose (lac) operon of E. coli. (a) The repressor protein (/ac/) is encoded by a 
1040-bp segment under separate transcriptional regulation. The transcription regulatory region con- 
sists of a CAP binding site, a promoter consensus sequence region, and an operator sequence. The 
three structural genes of the lac operon encode the enzymes £-galactosidase (lacZ), permease (/acY), 
and transacetylase (/acA). (b) The DNA sequence of the regulatory region of the /ac operon, including 
the —10 and —35 consensus sequences, the operator, and the CAP binding site. 


constitutively produced lac repressor protein binds to 
lacO, using its DNA-binding domain. By its presence at 
the operator, /ac repressor blocks RNA polymerase from 
binding to /acP and prevents transcription initiation. This 
transcriptional regulatory interaction is an example of 
negative control of transcription that is achieved through 
the binding of repressor protein to the transcription- 
regulating operator sequence. 

In contrast, the availability of lactose in the growth 
medium and the unavailability of glucose lead to the in- 
duction of transcription of the lac operon structural genes 
(Figure 14.7b). On this basis, the lac operon is identified 
as an inducible operon. With synthesis of B-galactosidase, 
the production of allolactose occurs. By binding to the 
allosteric domain of the repressor protein, allolactose 
forms the inducer—repressor complex. The formation 
of this complex induces an allosteric change that alters 
the conformation of the DNA-binding domain of the 
repressor protein to a form that does not recognize or 
bind the operator. An essential part of the induction of 


transcription is the binding of the CAP-cAMP complex to 
the CAP binding site, which facilitates achievement of the 
highest level of transcription. The polycistronic mRNA 
is synthesized, and translation produces B-galactosidase, 
permease, and transacetylase. 

When both glucose and lactose are available, E. coli 
utilize glucose. The presence of lactose, however, gener- 
ates a small amount of allolactose that carries out its nor- 
mal inducer function by binding to repressor protein. The 
inducer-repressor interaction opens the promoter region, 
and RNA polymerase binds. 

By itself, however, RNA polymerase is very ineffective 
at accomplishing transcription of the lac operon genes. This 
is due to the absence of binding of the CAP-cAMP complex 
at the CAP binding site (more on this in a moment). RNA 
polymerase by itself is only able to manage basal tran- 
scription (Figure 14.7c)—transcription that produces only 
a small number of polycistronic mRNAs and leads to the 
translation of a few molecules of §-galactosidase, permease, 
and transacetylase per cell. 
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Figure 14.7 lac operon transcription regulation. (a) When 
glucose is available and lactose is unavailable, lac operon genes 
are not transcribed. (b) Lactose availability in the absence of glu- 
cose induces activated transcription of operon genes by binding 
of CAP-cAMP at the CAP site. (c) The presence of both glucose 
and lactose leads to basal transcription of the operon. 


Basal transcription driven solely by RNA polymerase 
that gains access to the lac promoter through the inducer— 
repressor complex mechanism is insufficient to generate 
enough copies of the polycistronic mRNA to drive active 
lactose metabolism. A second regulatory process featur- 
ing positive control of transcription is required to fully 
activate lac operon gene transcription. Positive control of 
lac operon transcription lies in a DNA-protein interac- 
tion that occurs at the CAP—cAMP binding region of the 
lac operon promoter. This site is located at approximately 
—60 of lacP (see Figure 14.6b and Figure 14.7c). The CAP 
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Figure 14.8 CAP-cAMP complex binding to the CAP binding 
region. DNA bends at an approximate 90° angle around the 
CAP-cAMP complex and facilitates strong RNA polymerase 
binding that generates activated transcription of the /ac operon. 


binding site contains the sequence that attracts the CAP- 
cAMP complex, a small molecular complex composed 
of a protein known as the catabolite activator protein 
(CAP) and the nucleotide cyclic adenosine monophos- 
phate (cAMP). Binding of the CAP-cAMP complex to its 
binding site causes DNA to bend around the complex, and 
it increases the ability of RNA polymerase to transcribe 
lac operon genes (Figure 14.8). This positive regulatory 
effect leads to a high level of transcription—that is, to 
activated transcription—of lac operon genes that is many 
times greater than basal transcription. Activated tran- 
scription allows the cell to metabolize lactose and grow on 
a lactose-containing medium. 

The positive regulatory process is itself regulated 
indirectly by the level of glucose, which modulates the 
availability of cAMP. Cyclic AMP is synthesized from 
ATP (adenosine triphosphate) by the enzyme adenylate 
cyclase. During glycolysis, the availability of adenylate 
cyclase is limited and cAMP synthesis is reduced. Thus, 
when glucose is available, cAMP is very low in concentra- 
tion, almost no CAP-cAMP can form, and lac operon 
gene transcription is highly inefficient. This effect of glu- 
cose in blocking lac operon gene transcription, even when 
lactose is present, is known as catabolite repression, 
during which the presence of the preferred catabolite 
(glucose) represses the transcription of genes for an alter- 
native catabolite (lactose). 

With your budding understanding of lac operon gene 
transcription, perhaps the following question—a kind 
of chicken-and-egg conundrum—has occurred to you. 
Lactose must enter the cell so that allolactose can be pro- 
duced to act as an inducer. Lactose cannot enter the cell 
without the aid of permease that helps bring lactose into 
the cell. But since the /acY gene that produces permease 
is part of the lac operon, and transcription is not induced 
until lactose is present inside the cell, how does lactose 
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enter the cell in the first place? It does so in two ways. One 
stems from the reversibility of the interaction between 
the repressor protein and the lac operator. In the pres- 
ence of glucose and the absence of lactose, the repressor 
protein is almost always bound to the operator sequence. 
Occasionally and spontaneously, however, the repressor 
protein loses contact with the operator sequence. While 
short-lived, this spontaneous release is just enough to allow 
momentary transcription of the operon and production 
of a few molecules of B-galactosidase and permease. This 
small amount of permease and f-galactosidase, amount- 
ing to no more than a few molecules per cell, is sufficient 
to bring the first molecules of lactose to cross the cell 
membrane and to generate allolactose. This trickle of 
lactose quickly induces more transcription, launching a 
transcriptional cascade that soon causes the cell to switch 
its metabolism to lactose utilization. 

The second way also involves the production of a tiny 
amount of permease and {-galactosidase—in this case, 
through basal transcription that takes place when both 
glucose and lactose are available to a cell. Basal transcrip- 
tion becomes fully activated transcription when glucose is 
exhausted and only lactose is available to a cell. 


14.3 Mutational Analysis Deciphers 
Genetic Regulation of the lac Operon 


The identification and description of the lac operon began 
with a series of publications in the early 1960s by Francois 
Jacob, Jacques Monod, André Lwoff, and several other 
colleagues. Their genetic analysis of numerous lac operon 


mutants led to the identification of each gene and regula- 
tory region, and to the functional description of the op- 
eron we provided in the previous section. Jacob, Monod, 
and Lwoff were awarded the Nobel Prize in Physiology or 
Medicine in 1965 for this work (see the chapter opener 
photo). Their work also laid the foundation for a descrip- 
tion of lac operon transcription regulation at the DNA 
sequence level. We discuss several of the analyses of lac 
operon mutants and elements of the molecular analysis 
of lac operon transcriptional regulation in this section. As 
you read this discussion, refer to Table 14.1 and Table 14.2 
for a list of lac operon genes and regulatory sequences, as 
well as example genotypes and phenotypes associated with 
mutations we discuss. You can also refer to Experimental 
Insight 6.1, which discusses the determination of the gen- 
otype of a bacterial strain based on its pattern of growth 
and no growth in various media. 


Analysis of Structural Gene Mutations 


The genetic analysis of the Jac operon by Jacob, Monod, 
and colleagues was made possible by the induction of 
operon mutations. Several dozen lac mutants were gen- 
erated by treatment of E. coli with mutagens. The mu- 
tants were first subjected to genetic complementation 
experiments to determine whether the lac phenotypes 
of different mutants resulted from mutation of the same 
gene or from mutations of different genes. Investigations 
showed that lac mutants formed two complementation 
groups, indicating that two genes are responsible for the 
lac phenotype. The two complementation groups are 
today known to correspond to lacZ (B-galactosidase) and 
lacY (permease). 


Table 14.1 lac Operon Genes and Regulatory Sequences 


Gene/Sequence Product/Sequence Type 
Protein-Producing Genes 


lacl Repressor protein 
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Table 14.2 
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The complementation analysis was carried out using 
partial diploid bacterial strains that were produced by con- 
jugation between F’ (lac) and F bacteria (see Section 6.3). 
Recall that exconjugants produced by F’ X F” conjugation 
have two copies of a portion of the genome and are thus 
partially diploid. In the case of /ac operon partial diploids, 
one copy of the Jac operon information resides on the 
recipient bacterial chromosome, and the second copy of 
the operon is acquired on the F’ plasmid. The genotype 
of partial diploids is written with the F’ segment on the 
left and the recipient chromosome on the right. The ho- 
mologous chromosomes are separated by a slash (/). For 
example, the genotype of a partial diploid demonstrating 
complementation of lac gene mutations can be written as 
follows: 


Preotzty/rPoz y 


Analyzed as haploid genotypes, each portion of the partial 
diploid genotype above would produce the lac pheno- 
type. The F’ haploid lacks the ability to produce permease 
(lacY ), and the bacterial haploid is unable to produce 
B-galactosidase (lacZ ). Genetic complementation occurs 
in this partial diploid, however, and the resulting pheno- 
type is Jac’ (see Table 14.2). The molecular basis of ge- 
netic complementation in this case is that the F’ portion 
of the partial diploid provides B-galactosidase by its lacZ* 
gene, and the recipient portion of the partial diploid pro- 
vides permease by its lacY * gene. Based on the analysis of 
structural gene mutations, Jacob, Monod, and colleagues 
concluded that there are two protein-producing genes 
required for lac* growth behavior and that lacZ and lacY 
wild-type alleles are usually dominant to mutant alleles. 
Recombination mapping analysis revealed close genetic 
linkage of the three structural genes of the /ac operon, but 
the order of these structural genes (lacZ-lacY-lacA) was 
ultimately determined by mutational analysis. 

Another type of structural gene mutation that proved 
useful for understanding the process of translation of the 
lac polycistronic mRNA was base substitution nonsense 
mutations that generate stop codons in inappropriate loca- 
tions. If one of these mutations, known as polar mutations, 


occurs early in the lacZ portion of the polycistronic 
mRNA, it has the curious effect of significantly reducing 
or preventing translation of the other gene sequences in 
the transcript. How could this be? The answer is that there 
is just one Shine-Dalgarno sequence in the lac operon 
mRNA. It occurs upstream of the start codon for the lacZ 
gene (see Figure 14.6). Normally, individual ribosomes 
identify the Shine-Dalgarno sequence and translate the 
entire length of the lac operon polycistronic mRNA, pro- 
ducing three polypeptides. The presence of the polar (non- 
sense) mutation in the lacZ gene stops translation by the 
ribosome. As there is no other Shine-Dalgarno sequence 
in the transcript, the ribosome is unable to translate the 
lacY or lacA sequences. Thus, when a polar mutation oc- 
curs in the lacZ gene, no permease is produced, even if the 
strain is lacY~. 


lac Operon Regulatory Mutations 


Mutations of regulatory components of the lac operon 
alter the inducible response of the operon to the presence 
of lactose and allolactose in the cell. Certain mutations of 
the lac operon lead to constitutive mutants, which are 
unresponsive to the presence or absence of lactose in the 
growth medium. These mutants continuously transcribe 
the operon genes, rather than transcribing the genes in 
an inducible manner. Other regulatory mutations block 
all response to lactose and render the cell lac . Genetic 
mapping of constitutive mutations would eventually iden- 
tify two distinct sites of constitutive mutations of the lac 
operon: lacO and Jacl. Constitutive mutations of lacO 
render the operator DNA sequence unrecognizable to the 
wild-type DNA-binding portion of the repressor protein. 
On the other hand, constitutive mutations of Jacl result 
from production of a repressor protein with a mutated 
DNA-binding region that is unable to recognize and bind 
wild-type operator sequence. Both mutations prevent 
negative regulation of lac operon transcription. 

It was the initial discovery of the existence of two 
sites of lac operon constitutive mutations suggested to 
Jacob and Monod that a negative regulatory system with 
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two components exercises transcriptional control of the 
structural genes. They postulated that one constitutive 
mutation site is the gene producing a regulatory protein 
and the second is the target DNA-binding site for the 
regulatory protein binding. 


Operator Mutations The genetic evidence indicating 
that the operator is the DNA sequence binding the 
repressor protein comes from the finding that lac operator 
(lacO) mutations are exclusively cis-acting; that is, they 
influence the transcription of genes only on the same 
chromosome. In the wild-type organism, lacI* produces 
repressor protein that has an allosteric (allolactose) 
binding domain and a functional operator binding domain. 
Repressor protein uses its operator binding domain to 
bind the regulatory sequence and block transcription 
(Figure 14.9a). Bacteria with operator mutations are 
constitutive for transcription of lac operon genes and have 
the genotype I* P* OC Z* Y* (Figure 14.9b). The O allele 
designation signifies an “operator-constitutive mutation.” 
In OČ mutants, the nucleotide sequence of the operator 
region is altered and is no longer recognized by wild-type 
repressor protein. In the absence of repressor protein 
bound to the operator sequence, constitutive transcription 
of the operon genes takes place and B-galactosidase and 
permease are produced continuously. 

The crucial experiments revealing the cis-acting na- 
ture of JacO were performed with partial diploids. First 
it was shown that creation of partial diploids by conjuga- 
tion of a constitutive Jac’ strain (It Pt OC Z* Y*) witha 
lac’ strain producing defective B-galactosidase (J* P* OF 
ZT Y*) does not alter the constitutive transcription of 
B-galactosidase. Note that /acO° in the partial diploid ap- 
pears dominant to JacO*. Dominance on the part of lacO© 
arises because transcription of the wild-type lacZ” allele is 
exclusively controlled by the JacO© mutation, since these 
two alleles are on the same chromosome. The wild-type 
operator has no effect on the lacZ* allele because operator 
DNA is a cis-acting element, not a trans-acting element. 

In a second experiment, the lacZ alleles were on dif- 
ferent chromosomes, and the partial diploid genotype F’ 
I* P* OC Z` Y+ / I* P+ OF Z* Y~ was produced using 
two lac’ strains. In this case, the F’ strain is constitutive 
for permease production but does not produce functional 
B-galactosidase due to a lacZ mutation. The bacterial re- 
cipient strain produces f-galactosidase by the wild-type 
inducible mechanism, but it does not produce functional 
permease, due to mutation of /acY. The partial diploid 
produces permease constitutively, but -galactosidase is 
produced only when transcription is induced by lactose. 
This result could occur only if the operator is a cis-acting 
element. In this case, the operator allele in cis to Z* is 
wild type, so B-galactosidase production falls under the 
inducible control of the wild-type operator sequence. 
Notice that in this partial diploid, the wild-type operator 
appears to be dominant to the O“ mutant. 


(a) /* (wild type) 
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Repressor binds operator 
when the inducer is absent 
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repressor complex when 
inducer is present. 
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prevents repressor protein 
binding and leads to 
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lac operon. 
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Figure 14.9 Regulatory mutations of lacli and lacO. 

(a) Wild-type laci and lacO. (b) Operator-constitutive (lacO 
mutation. (c) lac!” (operator-binding domain) mutation. (d) lac 
(super-repressor) mutation of the allosteric binding domain. 


The apparent difference in the dominance relation- 
ship of O* and O° alleles is understandable if the lac 
operator is a cis-acting element that only controls the 
transcription of genes on the same DNA molecule. Taken 
together, the two experiments reveal the lac operator to 
be cis-dominant, meaning that the only genes the op- 
erator is able to influence are genes located downstream 
on the same gene. For the lac operon, the “dominant” 
operator allele can differ, depending on the alleles car- 
ried by structural genes on each chromosome. If both 
wild-type structural genes are in cis to lacOS the mutant 
operator is dominant because it constitutively transcribes 
both genes. This is the case in the first experiment. On the 
other hand, if wild-type structural genes are on different 
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Table 14.3 Synthesis of B-Galactosidase and Permease by Haploids and Partial Diploids with Regulatory Mutations 
Genotype 6-Galactosidase Permease Description 
Lactose No Lactose Lactose No Lactose 


OTZ Y d d + 


DEP Oz -+ -+ + 
PE OM A aval — — — 


AP O ZVA — — — 


chromosomes, as in the second experiment, then the 
lacO* allele is dominant because it exerts inducible tran- 
scriptional control on one of the two genes required for 
lactose metabolism (Table 14.3). 


Constitutive Repressor Protein Mutations Experi- 
mental evidence supporting the hypothesis that the 
repressor gene produces a regulatory protein comes from 
the analysis of mutants that constitutively transcribe lac 
operon genes where the mutant allele is recessive to wild- 
type allele. 

To see the dominance relationship of these alleles, 
let’s first consider a haploid cell with the lac operon 
genotype I~ P* O* Z* Y*. This cell constitutively tran- 
scribes and produces both ß-galactosidase and permease 
(Figure 14.9c). Similarly, a haploid strain with the geno- 
type I P* O' Z* Y produces $-galactosidase constitu- 
tively, but no permease is produced, and bacteria with 
the genotype I. P* O* Z7 Y* constitutively produce 
permease but do not produce f-galactosidase. 

In contrast, a partial diploid with the genotype F’ I* 
P*O'Z Y*/I P* Ot Z* Y expresses both enzymes in 
their normal inducible manner. The J* allele can be on 
either the F’ plasmid or the recipient chromosome and 
have the same effect, inevitably resulting in the domi- 
nance of J* over J”. This outcome indicates that lacI pro- 
duces a regulatory protein that is trans-acting—capable 
of influencing the expression of genes on other chromo- 
somes. In this context, trans refers to a protein capable 
of diffusing through the cell and binding to a cis-acting 
target sequence. 

The molecular explanation of the trans-acting ability 
of the lac repressor protein is that a lac mutant alters 
the DNA-binding domain of the protein, rendering it in- 
capable of binding the operator sequence. In the absence 
of negative control, transcription is constitutive. In partial 
diploids that are J*/I”, however, repressor protein with a 
functional DNA-binding domain is present in the cell and 
responds normally to the addition or removal of lactose 
from the cell. 


ats Constitutive transcription 
due to lacli mutation. 


ap Constitutive transcription due 
to lacO® mutation. 


= Transcription is not inducible, 
due to lacf mutation. 


= No effective transcription, due 
to lacP— mutation. 


Super-Repressor Protein Mutations A second set 
of repressor protein mutations produces a different 
consequence for lac operon transcription. These mutants 
produce mutant repressor protein with an altered allosteric 
domain. The mutant proteins are unable to bind allolactose 
and are unresponsive to lactose addition or removal from 
cells. The DNA-binding domain is unaffected by the allosteric 
domain mutation, but as a result of the nonfunctional 
allosteric domain, mutant repressor proteins cannot release 
the operator even in the presence of allolactose. 

Haploids and partial diploids with mutations of the 
allosteric domain of the repressor protein are identified 
as IS mutants and are designated super-repressors. These 
mutants are noninducible, meaning that operon gene tran- 
scription cannot be induced (Figure 14.9d and Table 14.3). 
Haploids with the genotype [> P* O* Z* Y* produce a re- 
pressor protein that binds normally to operator sequence, 
but lacking a functional allosteric domain, the protein is 
not removed from the operator by lactose in the cell. Such 
mutants are Jac and cannot be induced to metabolize lac- 
tose. Cultures of partial diploid bacteria with the genotype 
F' IS P* O* Z* Y*/ I* P* OF Z* Y* may initially have some 
inducible responsiveness to lactose, but this ability is lost as 
mutant repressor protein binds to operator sequences. This 
partial diploid reveals the dominance of I over I*. 


Promoter Mutations Mutations of promoter consensus 
sequences significantly reduce transcription or may 
eliminate it entirely (see Figure 8.11). To know the specific 
effect of a promoter mutation usually requires direct testing 
of transcription in the mutant organism. Promoters, like 
operators, are cis-acting regulatory sequences, and most 
mutations of /acP significantly reduce, and may entirely 
eliminate, transcription of lacZ and lacY genes, which are 
located in cis. This reduces B-galactosidase and permease 
production to such a low point that haploid bacteria with 
the genotype /* P7 O* Z* Y* are lac . 

Table 14.4 summarizes the conditions for lac operon 
gene transcription given the presence or absence of glu- 
cose and lactose. Active transcription of operon genes 
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Table 14.4 Transcription Conditions for the lac Operon 
lac Operon 

Glucose Lactose cAMP Allolactose Transcription Explanation 

Present Absent Absent Absent None Glucose is present to provide energy. There is no 
allolactose to bind repressor. There is no 
CAP-cAMP complex to bind CAP site. 

Present Present Absent Present Basal Glucose is present to provide energy; absence of 
cAMP prevents positive transcription regulation, 
but allolactose is present and acts as an inducer 
to allow a small amount of transcription. 

Absent Absent Present Absent None CAP-cAMP forms, but no allolactose is present 
to block repressor binding at operator. 

Absent Present Present Present High Inducer and CAP-cAMP are available to induce 


takes place only when glucose is depleted from the cell 
and lactose is present. Under these conditions, the follow- 
ing events occur: 


1. Cyclic AMP level rises as a result of the availability of 
adenylcyclase. 


2. CAP-cAMP complex forms and binds to the CAP 
site of the lac promoter, thus activating transcription. 


3. Allolactose is produced by a side reaction of the me- 
tabolism of lactose by B-galactosidase. 


4, Repressor protein conformation is modified by in- 
teraction with allolactose, causing the protein to 
release from the operator, thus allowing operon gene 
transcription. 


Basal transcription occurs when both glucose and 
lactose are present due to the presence of allolactose to 
bind repressor protein. When lactose is absent, no in- 
ducer—repressor complex can form, and no transcription 
takes place. To test your understanding of the /ac operon, 
see Genetic Analysis 14.1, which guides you through analy- 
sis of some lac operon mutants. 


Molecular Analysis of the Jac Operon 


In the 50 years since Jacob, Monod, and colleagues de- 
scribed their genetic analysis of the lac operon, molecular 
analysis and genome sequence analysis have identified the 
DNA sequences of its components (see Figure 14.6b). This 
and other accumulated molecular information weaves a 
virtually complete picture of lac operon transcription reg- 
ulation, revealing it to be somewhat more complex, but 
wholly consistent, with the description presented above. 
Experimental Insight 14.1 discusses two important 
pieces of experimental molecular evidence derived from 
DNA footprint protection analyses that pertain to tran- 
scriptional regulation of the lac operon. The first observa- 
tion is that the repressor protein binding location at the 
lac operator overlaps with the promoter binding loca- 
tion of RNA polymerase. This observation supports the 


and positively regulate transcription. 


hypothesis that repressor protein binding blocks RNA poly- 
merase binding and transcription initiation and, conversely, 
that when the repressor protein is not bound to the opera- 
tor, RNA polymerase can access and initiate transcription 
at the promoter. The second observation identifies three 
distinct segments of operator DNA sequence. These opera- 
tor segments, designated O;, O2, and O3 interact differently 
with the repressor protein, and the result of the interactions 
provides a mechanism by which repressor protein binding 
can block RNA polymerase access to the promoter. 
Additional molecular analysis reveals that the re- 
pressor protein is a homotetrameric protein formed by 
the union of four identical 360—amino acid polypeptides 
(Figure 14.10). The four polypeptides are joined together 


Operator — 
DNA-binding , 4% 
domains 


Allosteric domains 


Figure 14.10 The homotetrameric structure of the lac 
repressor protein. Operator binding and allosteric domains 
are formed on opposite sides of the protein. 


GENETIC ANALYSIS 


PROBLEM Evaluate the following lac operon partial diploids. Indicate whether the production 
of functional B-galactosidase from lacZ and of permease from lacY is “inducible,” “ 
or “noninducible” for each partial diploid. 


a. IT Ptotztytyt ptotz y~ 
b. TPT OSZtY It PT OTZ Yt 
c. I POCZ Y*/P Pt OF zt y+ 


Solution Strategies Solution Steps 


BREAK IT DOWN: Partial diploids 

constitutive,” have two copies of each /ac operon gene and 
regulatory sequence. Success evaluating the 
lac operon depends on knowing the function 
of each operon component. Study Table 14.1 
thoroughly (p. 476). 


BREAK IT DOWN: The transcription of lac 
operon genes is inducible if it is responsive to lactose 
presence and absence, constitutive if it is always on 
regardless of lactose availability, or noninducible if it 
cannot be activated (pp. 477—480). 


Evaluate 


1. Identify the topic this problem 
addresses and the nature of the 
required answer. 


2. Identify the critical information given in 
the problem. 


1. This problem concerns an analysis of patterns of transcriptional regulation 
and the production of functional B-galactosidase and permease by operon 
genotypes. The answer requires a determination of whether the enzymes are 
produced inducibly, constitutively, or not at all. 


2. The lac operon genotypes of three partial diploids are given. 


Deduce 


3. Describe the consequences of any 
mutations in genotype a. 


\ 


TIP: Assess regulatory mutations first; then 

consider the consequences for structural gene 
transcription in each partial diploid by evaluating 
the effect of each allele on transcription. 


4. Describe the consequences of any 
mutations in genotype b. 


5. Describe the consequences of any 
mutations in genotype c. 


3. The! mutation produces a repressor protein that is unable to bind operator 
sequence. The Z` mutation will not produce functional B-galactosidase, and 
the Y~ mutation will not produce functional permease. 


PITFALL: You must understand the wild-type function 

of each operon component before evaluating genotypes. Do 

not attempt to memorize patterns of “+” and “—" for operon 

components in hopes of determining ac’ or lac phenotypes. 

4. The Of mutation alters the operator sequence and prevents binding and 
transcriptional repression by repressor protein. The Z and Y mutations 
block production of functional B-galactosidase and permease. 

5. The /> mutation produces a super-repressor protein that has an altered 
allosteric domain and will not interact with allolactose. The OC and Z~ alter 
function as described above. 


Solve 


6. Determine the expression pattern of 
functional enzymes for partial diploid a. 


7. Determine the expression pattern of 
functional enzymes for partial diploid b. 


8. Determine the expression pattern of 
functional enzymes for partial diploid c. 


For more practice, see Problems 5, 16, 17, and 18. 


Answer a 

6. Wild-type repressor protein is trans-active and binds the wild-type operator. 
This cis-acting operator blocks transcription of Z* and Y* when lactose is not 
in the cell, but permits transcription when lactose is present. Therefore, both 
enzymes are produced inducibly. 


Answer b 

7. OC is cis-active on Z*, resulting in constitutive transcription. Y* is under the 
cis-active transcriptional control of O”. Therefore, B-galactosidase is 
produced constitutively, and permease is produced inducibly. 


Answer c 

8. The O° sequence is not recognized by either the wild-type repressor 
or the super-repressor. Both repressors have wild-type DNA-binding 
sequences. Cis-active of constitutively transcribes Y*. The super-repressor 
binds O*, and its cis activity renders Z* and Y* noninducible. Therefore, 
B-galactosidase is noninducible, and permease production is 
constitutive. 
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Experimental Insight 14.1 


Regulatory Proteins Binding to lac Operon Regulatory Sequences 


DNase | footprint protection analysis of the kind described in 
Research Technique 8.1 has been used to precisely identify 
the binding locations of lac repressor protein relative to the 
location of RNA polymerase binding in the regulatory re- 
gion of the /ac operon. Recall from the earlier description of 
this technique that identical control and experimental DNA 
fragments containing regulatory sequences are end-labeled 
with 32P. The experimental fragments are then exposed to 
DNA-binding proteins, but the control fragments are not. All 
fragments are then exposed to DNase | that randomly digests 
those segments not protected by bound proteins. The result- 
ing DNase I-digested DNA fragments are separated by gel 
electrophoresis to reveal the “footprint” of protein protection. 
The figure here shows the results of footprint analysis of a 
123-bp segment of the lac operon regulatory region from posi- 
tion +39 to —84. Control DNA in the first lane @ is not protein 
protected. The gel shows that the promoter regions protected 
by © RNA polymerase and © Jac repressor protein partially 
overlap one another. The relative positions of these protein- 
protected regions are consistent with the model that repressor 
protein binding can interfere with RNA polymerase binding. 
Separate DNase | footprint analysis of the lac operator 
region detects three segments of DNA sequence that are 
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DNasel footprint protection analysis of the lacP and lacO 


regions and model. 
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protected by lac repressor protein : O;, O2 O3. Lane a of the 
gel shown is control DNA not bound by protein, and is there- 
fore unprotected DNA. The experimental analysis identifies 
one protected segment, designated O;, as the principal op- 
erator sequence. The two other regions of protein-protected 
operator DNA sequence are designated O; and O}. Lanes d 
through g of the DNA footprint-protection gel are protected 
by repressor protein, and show the footprint gaps corre- 
sponding to these operator elements. 

Lanes of the gel also identify two regions, designated 
C; and C,, that are protected from DNase | digestion by the 
CAP-cAMP complex. This segment contains the consensus 
sequences for the CAP binding site that partially overlaps 
operator regions O; and O3. The relative positions of these 
protein-binding sites indicate two kinds of interactions be- 
tween proteins binding the /ac promoter and operator. First, 
when CAP-cAMP is bound to the CAP binding site, RNA 
polymerase gains enhanced access to the promoter, estab- 
lishing conditions for efficient transcription of lac operon 
genes. Second, the overlap of the CAP binding region with O; 
suggests that when repressor protein is bound to DNA, the 
CAP-cAMP complex is unable to bind, thus preventing posi- 
tive regulation of transcription. 
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lac repressor protein footprint protection and DNA 
binding. 
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Figure 14.11 The /acO region O; contains an inverted 
repeat sequence. The central G-C base pair is the pivot point 
of this region of twofold nucleotide symmetry of an inverted 
repeat sequence. 


at their C-terminal ends and are arranged as two identi- 
cal bundles. One end of each bundle forms an operator 
DNA-binding domain, and the other end forms the 
allosteric domain. The three operator DNA segments 
that are the targets of repressor protein binding share 
a conserved, 21-bp inverted repeat sequence. In each 
sequence, a central G-C base pair is at the midpoint of a 
twofold axis of symmetry (Figure 14.11). On either side of 
the central G-C base pair are inverted repeat sequences of 
10 bp each that are the specific binding location for poly- 
peptides in each half of the repressor protein. Mitchell 
Lewis and his colleagues examined the crystal structure 
of DNA-bound repressor protein in a 1996 study and de- 
termined that the tetrameric repressor protein binds to 
O; and O; and induces DNA loop formation that draws 
the O; and O; regions closer together (Figure 14.12). This 
DNA loop structure contains part of the lac promoter 
and prevents transcription by blocking access of RNA 
polymerase. 

Parallel experiments examining mutated operator 
DNA sequences reveal how constitutive operator muta- 
tions are caused by alterations of the DNA sequence in 
region O}. Figure 14.13 shows several base-pair substitu- 
tions that cause constitutive operator (09 mutations. 
Each of these changes disrupts the twofold symmetry of 
O;, masking the sequence from recognition by repres- 
sor protein. Since O; is the primary binding target of the 
repressor protein and O; must be bound before binding 
to O; can occur, O; mutation also disrupts binding to O3. 
The inability of repressor protein to bind to mutant op- 
erator sequence means that the transcription-repressing 
DNA loop cannot form. This in turn leaves the promoter 
available for binding by RNA polymerase and opens the 
door to continuous transcription and constitutive expres- 
sion of the lac operon genes. 


lac repressor 


Figure 14.12 lac repressor protein binding. The crystal 
structural model of lac repressor binding at lacO. 


14.4 Transcription from the 
Tryptophan Operon Is Repressible 
and Attenuated 


The Jac operon is an example of an inducible operon 
that produces proteins responsible for the breakdown of 
a sugar that is an alternative energy source to glucose. 
Operons like lac that are involved in catabolism of al- 
ternative energy sources are typically inducible, since 
they are called upon only when glucose is depleted and 
the alternative sugar is available. In contrast, operons 
involved in anabolic pathways (pathways that synthesize 
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Figure 14.13 Constitutive operator (0°) mutations. Eight 
base-substitution mutations in lacO region O; producing 
operator-comstitutive mutations. Each mutation disrupts the 
twofold symmetry of the operator inverted repeat sequences 
and prevents lac repressor protein binding. 
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compounds needed by the cell) can be regulated by nega- 
tive feedback mechanisms that operate through activ- 
ity of the end product of the pathway to block operon 
gene transcription. Operons of this kind are repressible 
operons. 

In addition to the negative feedback mechanism, cer- 
tain repressible operons have a second regulatory capability 
known as attenuation that has the ability to fine-tune tran- 
scription to match the momentary requirements of the cell, 
achieving a more-or-less steady state of compound avail- 
ability. The difference between attenuation and inducibility 
can be clarified by an analogy. Inducible operons, such as 
lac, are akin to light switches that provide illumination in 
one setting (“on”) and no illumination in the alternative 
setting (“off”). Inducible operons are turned on and off by 
molecular switches controlled by DNA-binding proteins. 
Attenuation, on the other hand, works more like a dimmer 
switch that allows illumination to be incrementally adjusted 
up or down. For several amino acid operons, the regulation 
of gene expression has evolved to maintain steady amino 
acid levels in cells. In such systems, feedback inhibition 
turns off operon gene transcription when the amino acid is 
readily available, and attenuation fine-tunes the amino acid 
level to maintain a steady-state concentration. 


Feedback Inhibition of Tryptophan Synthesis 


The tryptophan (trp) operon (“trip operon”) in the E. 
coli genome contains five structural genes that share a 


Figure 14.14 The tryptophan 


regulatory region containing a promoter (trpP), an op- 
erator (trpO), and a leader region (érpL) that contains the 
attenuator region (Figure 14.14). The regulatory region 
spans 312 base pairs, and the five structural genes span 
approximately 6800 base pairs. The five structural genes 
transcribed in the operon are, in order, trpE, trpD, trpC, 
trpB, and trpA. Together, the protein products of these 
genes are responsible for synthesis of the amino acid tryp- 
tophan. Outside the operon, a sixth gene, trpR, encodes 
the repressor protein that is not activated until it pairs with 
tryptophan. 

Transcription of trp operon genes is regulated by a 
feedback inhibition system that responds to free trypto- 
phan in the cell. In this system, tryptophan acts as a co- 
repressor by binding to and activating the trp repressor 
protein that is not active without its bound corepressor. 
Feedback inhibition is the principal mechanism turn- 
ing on and turning off trp operon gene transcription 
(Figure 14.15). In the absence of tryptophan, the inac- 
tive repressor is unable to bind trpO, and operon gene 
transcription takes place. When tryptophan is present, 
however, it binds the repressor to activate it, and the 
repressor—corepressor complex binds the operator to 
block transcription. This is an efficient mechanism that 
shuts down transcription of genes whose expression is 
not needed at the moment. Such systems have evolved 
because they save metabolic energy that would other- 
wise be wasted transcribing unneeded mRNA and later 
recycling the unused transcript. 
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Transcription 


Polycistronic 
mRNA 


The inactive repressor does not bind 
trpO, and transcription of the operon 
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The repressor is activated by the 
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Figure 14.15 Trp operon transcription regulation by the 
repressor, with tryptophan absent (a) and with tryptophan 
present (b). 


Based on this description, and knowing about the 
feedback inhibition of gene transcription, one might 
expect that trpR bacteria that are mutant for the re- 
pressor protein would show constitutive transcription of 
operon genes regardless of whether tryptophan is present. 
Surprisingly, however, this is not the case. In wild-type 
bacteria (trpR"), tryptophan synthesis is very low when 
tryptophan is present in the cell, but while tryptophan 
synthesis by trpR strains is higher under the same condi- 
tions, it is not at 100% capacity (Table 14.5). Both trpR* 
and trpR strains synthesize tryptophan at 100% of capac- 
ity when tryptophan is absent. This suggests that a second 
regulatory mechanism is also affecting transcription of trp 
operon genes. 


Attenuation of the trp Operon 


The second mechanism regulating trp operon gene tran- 
scription is attenuation that is controlled by alterna- 
tive folding undertaken by mRNA synthesized from the 
162-bp trpL region. RNA polymerase binds to trpP and 


Table 14.5 Percentage of Full Tryptophan 
Expression for trpR* and trpR” Strains 
Tryptophan Present Tryptophan Absent 
trpR* 8% 100% 
trpR 33% 100% 


initiates transcription of trpL. The trpL region contains 
four repeat DNA sequences (1 to 4), and the mRNA 
transcript of this region contains complementary repeats 
that lead to the folding of mRNA into double-stranded 
regions. The trp leader region also encodes a start co- 
don, a short polypeptide of 14 amino acids, and a stop 
codon. Translation of this 14—amino acid polypeptide 
plays a pivotal role in attenuation (Figure 14.16a). Two 
features of the trpL region are critical to its attenuation 
function. First, the four repeat sequences, designated 
1, 2, 3, and 4, can form different stem-loop structures 
(Figure 14.16b-d). (Stem-loop structures are discussed in 
Section 8.2 in connection with intrinsic transcription ter- 
mination in bacteria; see Figure 8.7.) Second, among the 
codons for the 14 amino acids encoded by trpL mRNA, 
there are two back-to-back tryptophan codons (UGG) that 
function to sense the availability of tryptophan and are 
essential for attenuation. 

The formation of stem loops of trpL mRNA is directly 
tied to the continuation or termination of transcription 
of the five trp operon genes. In the trpL region mRNA, 
region 1 is complementary to region 2, region 2 is com- 
plementary to region 3, and region 3 is complementary 
to region 4. Two of these stem-loop structures, the 3—4 
stem loop and the 2-3 stem loop, are central to attenua- 
tion. The third type of stem loop, the 1-2 stem loop, plays 
a minor role in attenuation. 

The 3-4 stem loop of mRNA, which is the ter- 
mination stem loop, signals transcription termination. 
This is identified as the transcription termination site 
in Figure 14.14d. Formation of the 3—4 stem-loop halts 
RNA polymerase progress along the DNA, terminating 
transcription in the leader region before it reaches the 
structural genes of the operon (Figure 14.17a). Notice 
that region 4 is followed immediately by a poly-uracil 
sequence (a poly-U tail). This configuration—an mRNA 
stem loop followed by a uracil string—is the same as one 
described in connection with intrinsic termination of 
transcription in bacteria (see Figure 8.7). Formation of 
a 3—4 stem loop may be accompanied by formation of a 
1-2 stem loop, which can induce a pause in the attenu- 
ation process. Formation of the 1-2 stem loop occurs 
when a ribosome does not affiliate with the nascent 
trp operon leader mRNA. In the absence of an RNA- 
bound ribosome, regions 1 and 2 form a double-stranded 
stem. This leads, in turn, to subsequent formation of a 
3—4 stem loop that terminates transcription. 

The alternative to the 3—4 stem loop is the 2-3 stem 
loop, which is the antitermination stem loop. This stem 
loop forms when region 1 is unavailable for immediate 
pairing with region 2. This situation leads region 2 to 
pair with region 3. As a consequence, formation of the 
2-3 stem loop precludes the formation of a 3-4 stem 
loop (Figure 14.17b). The antitermination stem loop al- 
lows RNA polymerase to continue transcription through 
the leader region and into the structural genes of the trp 
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Figure 14.16 The trpL attenuator region and its mRNA transcript. (a) The trpL attenuator 


contains 162 nucleotides that include a 14-amino acid coding sequence and four inverted repeat 
sequences that encode regions 1 through 4 in trpL mRNA. (b)-(d) Three alternative stem loops can 
form in mRNA. That encode region 1 to 4 and the short 14-amino acid polypeptide coding region. 


operon, beginning with the transcription of trpE. If tran- 
scription progresses past region 4, a polycistronic mRNA 
spanning the five trp genes is produced. Translation of the 
five enzymes required for tryptophan synthesis follows. 
Each mRNA transcribed from the trpL operon even- 
tually forms either a 2-3 stem loop or a 3—4 stem loop, 
but what determines the type of stem loop an mRNA will 
form? The coupling of transcription and translation that 
is a prominent feature of bacterial gene expression plays a 
critical role in deciding this outcome. Transcription of the 
trpL region begins at the +1 nucleotide after RNA poly- 
merase initiates transcription. Transcription across repeat 


regions 1 and 2 can lead to formation of a 1—2 stem loop 
that temporarily pauses the progress of RNA polymerase. 
The pause is only momentary, however; it lasts just long 
enough for a ribosome to bind at the start codon in trpL 
and begin translation of the 14—amino acid polypeptide 
starting with the AUG codon identified in Figure 14.16. 
Translation initiation breaks the 1-2 stem loop, RNA 
polymerase resumes transcription, and the ribosome and 
RNA polymerase begin their coupled progression. 

Notice three features of the leader mRNA depicted in 
Figure 14.17: (1) The polypeptide-coding sequence over- 
laps the entirety of leader region 1, and the stop codon is 
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Ribosome completes translation of trpL coding 
sequence and occupies regions 1 and 2. Regions 3 
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Figure 14.17 TrpL mRNA stem 
loop formation. (a) In tryptophan 
abundance, the 3-4 (termination) 
stem loop terminates transcription 
after the poly-U string. (b) In 
tryptophan starvation, the 2-3 
(antitermination) stem loop leads 
to polycistronic mRNA synthesis. 


UUUUUUUUtrpE 
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Ribosome stalls at region 1, and regions 2 and 3 
pair. Transcription continues into operon genes. 


polypeptide region 


immediately adjacent to region 2; (2) codons 10 and 11 
of the mRNA specify tryptophan, making completion of 
translation dependent on tryptophan availability; and (3) 
region 4 is followed immediately by a poly-U string, a fea- 
ture associated with intrinsic termination of transcription. 
As coupled transcription and translation proceed, the rel- 
ative positions of RNA polymerase and the ribosome are 
determined by how efficiently the ribosome can progress 
along the mRNA. This process, in turn, is tied directly to 
the availability of tryptophan and the rapidity with which 
tryptophan is inserted into the nascent polypeptide chain. 
When the cell has an adequate supply of tryptophan, the 


ribosome makes steady progress along trpL mRNA, arriv- 
ing at the stop codon where it partially overlays region 1 
and region 2. Simultaneously, RNA polymerase is tran- 
scribing region 3, followed by region 4. With a portion 
of region 2 occupied by the ribosome and unavailable for 
pairing in a stem loop, region 3 forms a stem loop with re- 
gion 4, the only available complementary segment of the 
mRNA. The 3-4 stem loop, being immediately followed 
by a poly-U string, causes transcription to spontaneously 
terminate at the end of region 4 by the intrinsic process. 
Formation of the 3-4 stem loop (the termination stem 
loop) stops transcription of the trp operon in the leader 
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sequence before RNA polymerase reaches the beginning 
of the trpE gene. Transcription thus ceases only when the 
system senses that no additional tryptophan is needed to 
supply translation. 

When the cell is starved for tryptophan, the supply 
of charged tRNA™? is low. The ribosome is forced to 
pause momentarily at codons 10 and 11 to await the ar- 
rival of a charged tryptophan tRNA that will incorporate 
tryptophan into the nascent polypeptide. As the ribo- 
some pauses, its mass covers region 1. Meanwhile, RNA 
polymerase continues to transcribe trpL. As RNA poly- 
merase transcribes region 3, the region finds a comple- 
mentary partner in region 2, leading to 2-3 stem-loop 
formation. Region 3 is not followed by a poly-U string, 
making intrinsic termination impossible. Transcription 
continues through region 4 and on into the structural 
gene region of the operon to produce the polycistronic 
mRNA transcript of the operon. Formation of a 2-3 
stem loop (the antitermination stem loop) thus permits 
transcription and translation of the enzymes necessary 
to synthesize tryptophan when the system senses that the 
available supply of tryptophan is insufficient to support 
translation. 

Each trpL mRNA makes a molecularly based “deci- 
sion” about whether to form a 3—4 or a 2-3 stem loop, 
depending on the availability of charged tRNA"? at the 
moment tRNA?"? is needed by ribosomes. It is likely that 
at any given moment in time, a single bacterial cell con- 
tains a mixture of trpL mRNAs with 2-3 stem loops and 
trpL mRNAs with 3—4 stem loops. The balance shifts in 
the direction of more 3—4 stem loops and fewer 2-3 stem 
loops at higher levels of tryptophan concentration and 
shifts in the opposite direction—more 2-3 stem loops 
and fewer 3-4 stem loops—as tryptophan concentra- 
tion falls. The resulting fine-tuning allows each cell to 
maintain a relatively steady concentration of tryptophan 
by turning tryptophan synthesis up or down to meet the 
needs of the cell. 


Attenuation Mutations 


The attenuation model is supported by mutagenesis ex- 
periments. For example, experiments in which one of 
the two adjacent tryptophan codons (in positions 10 
and 11 of the trpL mRNA) has been altered by missense 
mutation to specify another amino acid have provided 
evidence of the importance of the back-to-back tryp- 
tophan codons in the trpL transcript. Mutation of one 
tryptophan UGG codon affects the attenuator respon- 
siveness to tryptophan. If both tryptophan codons are 
altered by missense mutation, the attenuator no longer 
senses tryptophan concentration and instead senses the 
availability of the amino acid encoded by the mutated 
codons. Mutagenesis experiments have also targeted re- 
gions 3 and 4 of the leader sequence (Figure 14.18). Base 
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Figure 14.18 Mutations of trpL. Mutational analyses 
identify 10 base-pair substitutions in regions 3 and 4 of trpL 
that each decrease the efficiency of transcriptional regulation 
in the attenuator region by disrupting formation of the 

3-4 stem loop. 


substitutions that reduce the percentage of complemen- 
tary base pairs binding these two regions destabilize 
the termination stem loop and reduce the efficiency of 
the mutated operon system in repressing structural gene 
transcription. Genetic Analysis 14.2 examines mutations 
of the trp operon. 


Attenuation in Other Amino Acid Operon 
Systems 


Attenuation represses transcription of structural genes 
in several amino acid operon systems in bacteria such as 
E. coli and Salmonella typhimurium. Like the trp operon, 
these other amino acid operons also contain multiple co- 
dons for the target amino acid in their leader transcripts 
(Figure 14.19). For example, the leader polypeptide of the 
E. coli histidine operon contains a run of seven consecutive 
histidine residues in the attenuator. Similarly, the phenyl- 
alanine leader polypeptide contains seven phenylalanine 
residues in a span of nine amino acids in the attenuator 
region. Like the trp operon, these operons use attenuation 
to form antitermination stem loops to regulate operon 
gene transcription. 


his operon: 


pheA operon: 


TARM GTS IAP) Phe Phe Phe/\F) Phe Phe Phetim Phe NON | 


thr operon: 
Met\Lys\Aralllel(Serllhr TarillesL aL TAFIA TAr IAThr TAF cyg/ 


Figure 14.19 Four bacterial amino acid operons with 
attenuator control of transcription. The regulatory amino acid 
for each operon is shown in bold. 


GENETIC ANALYSIS 


PROBLEM Describe the effects on attenuation and on tryptophan synthesis of the following mutations 
of the tryptophan codons (UGG) in the attenuator region of the operon. 


a. The tryptophan codons are mutated to UAGUGG. BREAK IT DOWN: You should be able to 


define attenuation and to describe how the pres- 
b. The tryptophan codons are mutated to UUGUUG. ence of two tryptophan codons in the trp operon 


leader transcript participate in determining 
whether the termination (3—4) stem loop or the 
antitermination (2-3) stem loop forms in the 
transcript. See Figure 14.17 (p. 487). 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem addresses T: 
and the nature of the required answer. 


This problem concerns the consequences of mutations to the UGG (trypto- 
phan) codons in the attenuator region of the trp operon. The answer requires 
a description of mutational consequences for tryptophan regulation and 


synthesis. 

2. Identify the critical information given in 2. The mutant codon sequences are given. 
the problem. 

Deduce 

3. Examine the nature of the mutation in 3. The base substitution in mutant (a) creates a stop codon in place of the first 
part (a). tryptophan codon. 

4. Examine the nature of the mutation in 4. Two base substitutions are seen in mutant (b). Each creates a leucine codon 
part (b). in place of a tryptophan codon. 

Solve 


Answer a 

5. UAG is a stop codon that halts translation of the polypeptide. The location of 
this stop codon will prevent the ribosome from covering repeat region 2. The 
2-3 stem loop is the only regulatory configuration that can form, and it will 
lead to constitutive tryptophan synthesis. 

Answer b 

6. Both mutant codons in this case encode leucine. These mutational changes 
will prevent attenuation of the trp operon in response to tryptophan level. 
Instead, tryptophan synthesis will attenuate in response to the level of leu- 
cine since the availability of leucine to add to the polypeptide will determine 


which stem loop will form. 
MasteringGenetics™ 


6) Describe the consequence of the 
mutation in part (a). 

TIP: Compare the transcription of the 

wild-type operon to that of this mutant 


operon (see Figures 14.16 to 14.18). 


6. Describe the consequence of the 
mutation in part (b). 


For more practice, see Problems 7, 15, and 25. 


Visit the Study Area to access study tools. 


14.5 Bacteria Regulate the 
Transcription of Stress Response Genes 
and Translation and Archaea Regulate 
Transcription in a Bacteria-like Manner 


The need on the part of bacteria to respond rapidly to 
changing environmental conditions suggests that tran- 
scriptional regulation must accommodate both common 
and rare circumstances, and also that the regulation of 
translation must be available under certain circumstances. 
This section presents examples of transcriptional regu- 
lation in bacteria under rarely encountered conditions, 
describes how bacteria regulate translation, and concludes 
with a discussion of transcription regulation mechanisms 
in Archaea. 


Alternative Sigma Factors and Stress 
Response 


The operon mechanisms described to this point are 
examples of the regulatory strategies employed by bacte- 
rial cells under conditions they encounter routinely. In 
response to rare or unusual environmental circumstances, 
however, bacteria switch gene transcription patterns to 
use genes that are not normally expressed. The response 
of E. coli to heat stress illustrates how expression of 
an alternative sigma (o) factor alters gene transcription 
by activating the transcription of specialized heat stress 
response genes. 

Escherichia coli grow vigorously at 37°C and can toler- 
ate only narrow temperature variation. At low temperatures, 
their growth slows—an important reason refrigeration 
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is used to preserve foods. At the other extreme, high tem- 
peratures kill the bacteria. This is the reason cooking is 
so efficient at reducing bacterial contamination of food. 
At the less dramatically elevated temperatures of 45°C, 
E. coli change their pattern of transcription by activating the 
expression of genes that are part of the heat shock response 
by the cell. The heat shock response protects E. coli cells 
from certain kinds of heat-induced damage. Similar mecha- 
nisms are common in other microorganisms as well as in 
fruit flies, plants, and animals, including humans. 

Heat shock response in bacteria involves expres- 
sion of an alternative sigma (o) factor that changes the 
promoter-recognition capacity of the RNA polymerase 
core enzyme. Recall that the RNA polymerase core en- 
zyme is bound by a sigma factor to form the holoenzyme 
(see Section 8.2). Under normal growth conditions, the 
RNA polymerase holoenzyme recognizes bacterial pro- 
moters containing an AT-rich Pribnow box at the —10 
site. The common sigma factor, identified as o”°, forms 
this holoenzyme that transcribes a wide array of bacterial 
genes under normal physiological conditions. 

Bacteria grown at 45°C undergo several changes, 
including initiation of the expression of heat shock pro- 
teins, which are expressed only at high temperature, and 
of chaperon proteins, a class of proteins that either refold 
or degrade other proteins damaged by high heat. At these 
higher temperatures, o”? is unstable, and RNA poly- 
merase containing it functions very poorly. To explain 
the transcription of heat shock proteins in the presence 
of poorly functioning o’°-containing RNA polymerase, 
researchers proposed and quickly found genetic evidence 
pointing to an alternative, high-temperature o factor. 

The evidence came from studies of mutant, tempera- 
ture-sensitive E. coli that grow normally at 37°C but fail to 
grow at 45°C. This temperature sensitivity is a conditional 
lethal mutation affecting a gene called rpoH, which en- 
codes an alternative sigma factor known as 0°”. When o°? 
binds an RNA polymerase core enzyme, the holoenzyme 
recognizes different promoter sequences than are recog- 
nized by holoenzymes containing o”? (Figure 14.20). In 
contrast to the AT richness that characterizes the Pribnow 
box sequence of bacterial promoters, the —10 region of 
promoters recognized by o*?-containing RNA polymerase 
is rich in G-C base pairs. 

The promoter for rpoH is recognized by o°- 
containing RNA polymerase when the temperature is 
elevated. The polypeptide translated from rpoH mRNA 
is very active in stimulating transcription of heat shock 
genes. In addition, transcription of a third sigma factor 
known as o°”, which is normally present in E. coli cells at 
a very low level, is greatly elevated. The RNA polymerase 
holoenzyme containing o°% also recognizes the rpoH pro- 
moter and transcribes the gene at elevated temperatures 
that inactivate o”. 

A second transcriptional change that occurs as a 
consequence of high heat is a change in the chaperon 
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Figure 14.20 Alternative sigma factors for heat shock 
genes. (a) Promoter sequences recognized by o’°- and o72- 
containing RNA polymerase. (b) At elevated temperature, o”? 
and o% transcribe rpoH, which encodes o°? that in turn joins the 
RNA core enzyme to transcribe heat shock genes. 


proteins. At normal growth temperatures, several chap- 
eron proteins bind the small amount of o°? present in 
the cell to inhibit its ability to form holoenzyme. At high 
temperatures, chaperone proteins release o°”, leaving it 
free to join an RNA polymerase core enzyme and form 
a holoenzyme. Free chaperon proteins are redirected to 
bind heat-damaged cellular proteins instead. In this role, 
chaperon proteins either degrade the proteins they bind 
or assist in refolding the proteins. 

Several additional examples of the use of alterna- 
tive sigma factors in bacteria have been described. For 
example, Bacillus subtilis is a bacterium that normally 
propagates by vegetative growth, but poor growth condi- 
tions switch the growth mode to sporulation by activat- 
ing the expression of alternative sigma factors. The gene 
transcription evidence shows that as growth conditions 
deteriorate, transcription of the common sigma factor 
is replaced by the transcription of two alternative sigma 
factors. The new sigma factors recognize the unique 
promoters and transcribe genes used in sporulation. 
A broad array of evidence shows that switching transcrip- 
tion from the normal sigma factor to alternative sigma 
factors induces a genome-wide change in the pattern 
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Table 14.6 


Mechanism Actions and Outcomes 


1. Operon-specific control 


Mechanisms of Transcription Regulation in Bacteria 


Inducer substances, such as lactose, and negative feedback mechanisms, such as tryptophan 


availability, regulate gene transcription in coordinately controlled operons. 


2. CAP-cAMP control 


3. Alternative sigma factors 
alternative sigma factors. 


of gene expression that silences previously active genes 
and initiates transcription of specialized genes that are 
used only under restrictive or extreme growth conditions. 
Table 14.6 compares and contrasts the mechanisms of 
gene regulation in bacterial systems. 


Translational Regulation in Bacteria 


Transcriptional regulation is far and away the predomi- 
nant mode of controlling gene expression in bacteria, 
but bacteria are also capable of translational regulation. 
Translational regulation takes place by two mechanisms, 
one that binds protein to an mRNA to prevent its trans- 
lation and another that pairs complementary antisense 
RNA with the mRNA to block its translation. 

Translation repressor proteins regulate translation 
by binding mRNA in the vicinity of the Shine-Dalgarno 
sequence. Protein binding in this location interferes with 
recognition of the Shine-Dalgarno sequence by the 16S 
rRNA in the small ribosomal subunit and so blocks trans- 
lation initiation. One of the clearest examples of this kind 
of regulatory protein-mRNA interaction is seen in the 
translational regulation of ribosomal proteins in E. coli. 
The ribosomal proteins are encoded in a series of oper- 
ons that produce polycistronic mRNAs. These operons 
are under a certain degree of transcriptional regulation, 
but the most prominent control of production of ribo- 
somal proteins is at the translational level. One of the 
protein products from each ribosomal protein operon can 
bind that operon’s polycistronic mRNA near the 5'-most 
Shine-Dalgarno sequence, thus preventing binding of the 
small ribosomal subunit to the polycistronic mRNA and 
inhibiting synthesis of the proteins encoded by the operon. 

Bacterial translation can also be inhibited by the 
activity of antisense RNA, an RNA molecule that is 
complementary to a portion of a specific mRNA. The 
binding of an mRNA by an antisense RNA prevents ribo- 
some attachment to the mRNA and blocks translation. 
Several examples of bacterial translational regulation by 
antisense RNA have been described. One of the first- 
discovered mechanisms of antisense control of translation 
comes from the regulation of transposase production by 
the bacterial insertion sequence /S10. Transposase is the 
enzyme that drives the movement of transposable genetic 


CAP-cAMP is utilized as a positive regulator of transcription for genes in several different 
operons, including the lac operon. 


Extreme growth conditions, such as heat stress and starvation, induce transcription of 


elements in genomes (see Section 13.6). Transposase cuts 
DNA for transposable element removal and insertion. A 
low level of transposition can be tolerated by bacterial 
genomes and may even be advantageous. Excessive trans- 
posase expression, however, leads to excessive transposi- 
tion, which may cause lethal mutations due to transposon 
insertion into critical genes. 

The S10 insertion sequence contains two promoters. 
One, called Pyy is relatively weak and controls transcrip- 
tion of the DNA strand coding for active transposase. The 
second promoter, Pour, is much stronger. This promoter 
is embedded in the transposase gene and directs tran- 
scription of the noncoding strand of the gene, producing 
an antisense RNA that is complementary to the 5’ end of 
transposase mRNA and covers up the Shine-Dalgarno 
sequence of the mRNA, preventing its recognition by the 
small ribosomal subunit (Figure 14.21). As a consequence 
of the stronger Pour promoter, IS10 antisense RNA is 
more abundant than transposase mRNA. This results in 
most of the transposase mRNA being bound by antisense 
RNA and effectively prevents translation of nearly all 
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Figure 14.21 Antisense RNA control of the expression of 
IS10 transposase. Two promoters each drive the synthesis of 

a transcript from the /S70 transposon. The transposase gene 
mRNA transcript (from Pj, can hybridize with the antisense RNA 
transcript (from Poy7) to block production of the transposase 
enzyme by preventing translation. 
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transposase mRNA. Nevertheless, an occasional trans- 
posase mRNA escapes antisense binding and undergoes 
translation. This generates a low level of transposase that 
initiates the rare event of IS10 transposition within the 
bacterial genome. 


Transcriptional Regulation in Archaea 


In previous chapters, we have seen numerous examples 
of how Archaea, Bacteria, and Eukarya have diverged 
from their common ancestor. We have also looked with 
interest at patterns in the features they continue to share. 
Section 8.3, for example, described the basic transcrip- 
tion machinery of archaea, including RNA polymerase 
and some general transcription factors, as being clearly 
eukaryote-like. We will now see, however, that many 
of the transcription regulatory proteins in archaea are 
similar to bacterial transcription regulators. This suggests 
that archaea are likely to use bacteria-like mechanisms 
to regulate transcription. Indeed, research on archaeal 
transcription regulation has identified several instances in 
which a repressor protein exerts negative control of tran- 
scription. Evidence of positive control of transcription of 
archaeal genes has also been found. 

Archaeal genomes contain many operons producing 
polycistronic mRNA. The preceding pages have demon- 
strated this pattern of gene organization to be common 
in bacteria, but it has not been documented in eukaryotes. 
In keeping with the organization of many of their genes 
into operons, archaea frequently use repressor proteins to 
bind operator sites near, or overlapping, the promoters. 
As in similar bacterial systems, repressor-protein binding 
in archaea interferes with RNA polymerase binding and 
transcription initiation, thus exerting negative control of 
transcription. 

One example of this negative transcriptional control 
has been identified in the archaeon Methanococcus mari- 
paludis, where the protein NprR operates as a repressor 
of the transcription of two operon genes, nif and glnA, 
that are required for nitrogen metabolism. Transcription 
of these genes is normally induced when nitrogen is pres- 
ent and is repressed when nitrogen is absent. Genetic 
analysis of M. maripaludis strains with mutations that 


Table 14.7 
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block production of NprR detect constitutive transcrip- 
tion of nif and glnA. This finding is analogous to the 
observation of constitutive transcription of lac operon 
genes in lac! bacteria. The genetic evidence suggests 
that binding of NprR blocks recruitment of RNA poly- 
merase to the operon promoter. Another example of neg- 
ative control of transcription by a repressor protein has 
been documented in Archaeoglobus fulgidus, where the 
repressor protein Mdr1 binds to an operator site and in 
so doing blocks binding of RNA polymerase at an operon 
promoter. Table 14.7 lists these and additional examples 
of archaeal transcription-regulating proteins. 

Positive control of transcription of archaeal op- 
erons has also been observed. The protein Ptr2 in 
Methanococcus jannaschii has been shown to act as a 
transcription activator. When Ptr2 binds upstream of 
the RNA polymerase binding site in the promoter region, 
the binding of the archaeal general transcription factor 
protein TBP (a protein homologous to eukaryotic TATA- 
binding protein) is enhanced. TBP helps recruit RNA 
polymerase to the promoter. This action is similar to the 
positive regulatory effect of the CAP—cAMP complex 
binding to the CAP binding site upstream of the bacterial 
lac operon RNA polymerase binding site in the promoter. 

The archaeal domain is diverse, and research on 
archaeal transcription and transcription regulation is in 
its infancy in comparison with similar research on bacte- 
ria and eukaryotes. Yet it already seems clear that further 
research will reveal transcriptional systems both novel 
and familiar. 


14.6 Antiterminators and 
Repressors Control Lambda Phage 
Infection of E. coli 


Bacteriophage (or phage, for short) are viruses that infect 
bacterial cells. Like all viruses, they must infect host cells 
to reproduce (see Section 6.5). Their tiny genomes do 
not contain all the genes necessary for replication, tran- 
scription, and translation, so phage are obligate parasites 
that use an ingenious array of tricks to accomplish these 


Selected Transcriptional Regulatory Proteins in Archaea 


Species Protein 

A. fulgidus Mdr1 Repressor 

M. maripaludis NrpR Repressor 

S. solfataricus = Lrs14 “Repressor 
P. furiosus PhrA Repressor 

M. jannaschii Ptr2 Activator 


Repressor or activator 


Mode of action 

Blocks RNA polymerase binding 
Blocks RNA polymerase binding 
Blocks TBP binding 
Blocks RNA polymerase binding 
Facilitates TBP binding 


7 Information adapted from S. D. Bell. 2005. Archaeal transcription regulation—variation on a bacterial theme. Trends Microbiol., 13: 262-65. 
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molecular processes. The secret to their reproductive suc- 
cess lies in their ability to commandeer bacterial proteins 
and enzymes to preferentially express phage genes over 
bacterial genes. 

Given the limited content of phage genomes, some of 
the most important genes for phage reproduction are those 
that redirect the activity of bacterial host genes to serve 
phage requirements. Successful phage infection requires 
(1) that genetic regulatory switches be controlled through 
phage gene expression to redirect the action of host genes 
and (2) that phage gene expression initiate a sequence of 
events leading the bacterium to participate in the expres- 
sion of phage genetic information. In no bacteriophage is 
there a clearer picture of the processes that control regula- 
tory genetic switching than in lambda (A) phage. 

Recall that all bacteriophage are capable of infecting 
and reproducing within the host bacterial cell. The infec- 
tion ends with the lysis of the host cell, in a process called 
the lytic cycle (see Figure 6.15). But certain bacteriophage 
known as temperate phage, of which A phage is an ex- 
ample, are also capable of a lysogenic cycle, or lysogeny. 
The lysogenic cycle is characterized by integration of the 
phage into the host chromosome, converting the host into 
a lysogen. Lysogenic integration is site specific, meaning it 
occurs at a sequence shared by the phage and the bacte- 
rial host (see Figure 6.19). The phage enzyme integrase 
is responsible for lysogenic integration. In this section, 
we discuss the two life cycles of phage, examining the 
regulatory proteins that control which life cycle a particu- 
lar infection will undertake, as well as the actions of the 
proteins that control each life cycle. 


The Lambda Phage Genome 


The A phage genome is composed of approximately 48 kb 
of linear, double-stranded DNA that encodes nearly 
60 genes (Figure 14.22a). Its injection into a host bacte- 
rial cell leads to an immediate circularization inside the 
host cell that is accomplished by the joining of two single- 
stranded cohesive (cos) ends that are each 12 nucleotides 
in length (Figure 14.22b). A host DNA ligase seals the two 
gaps that are left when the cohesive ends join and pro- 
duces a circularized À phage that is ready to begin gene 
expression. 

The A phage genome is organized as a series of 
operons. The genes in each operon are expressed in a 
well-defined sequence. Expression of genes in certain 
operons begins immediately after circularization. The 
specific order of gene expression is critical to the ability 
of A phage to carry out successful infection of its bacte- 
rial host. Consequently, immediate early genes are ex- 
pressed shortly after circularization, delayed early genes 
are expressed next, and late genes are expressed later 
in the infection cycle. The transcription of immediate 
early, delayed early, and late gene regions is determined 
by binding of two regulatory proteins, one known as an 


antiterminator, whose binding permits gene transcrip- 
tion by preventing transcription termination, and the 
other protein acting as a repressor that blocks additional 
transcription. 

Immediately following circularization of the A phage 
chromosome, early promoters and early operators con- 
trol transcription of genes whose protein products inter- 
act to determine whether the phage undergoes the lytic 
cycle or the lysogenic cycle (see Chapter 6). The lytic 
cycle results in a rapidly progressing infection leading 
to lysis (rupture) of the host cell and release of scores of 
progeny phage. In the lysogenic life cycle, on the other 
hand, the phage chromosome integrates into the host 
chromosome, as noted above. Expression of genes in the 
integrated phage chromosome (the prophage) is mini- 
mal; only the genes necessary to maintain lysogeny are 
expressed. Replication of the bacterial chromosome pro- 
duces daughter cells that carry a copy of the prophage. 
Lysogeny continues until the prophage excises itself from 
its integration site, reactivating phage gene expression 
and the lytic cycle. 


Early Gene Transcription 


Upon circularization of the phage chromosome, the two 
immediate early A phage genes N and cro are transcribed, 
and the N and cro proteins are translated. Transcription 
and translation of these genes, as well as all of the other 
genes we mention, is accomplished by bacterial host pro- 
teins and ribosomes because the A phage genome does not 
encode these functions. The N protein is an antitermina- 
tor protein, and the cro protein is a repressor. These two 
proteins engage in a molecular tug-of-war for control of a 
genetic switch that determines whether the infection will 
result in the lytic cycle or the lysogenic cycle. The early 
promoter Pp controls rightward transcription of immedi- 
ate early genes, beginning with the cro gene (for control of 
repressor and others) (Foundation Figure 14.23, @). The 
immediate early promoter P; controls leftward transcrip- 
tion beginning with the N gene, whose protein product 
blocks transcription termination and allows delayed early 
and late genes to be transcribed @. 

The antitermination protein N binds to three tran- 
scription-terminating DNA sequences: tz, tp;, and tp 
(see Foundation Figure 14.23, @). When not bound by N 
protein, termination sequence t; acts to block leftward 
transcription beyond N. In the other direction, tp; and 
tro prevent rightward transcription beyond cro or beyond 
three other early genes—ciJ, O, and P. When N protein 
binds tz, te;, and tp, however, delayed early genes left- 
ward of t; and rightward of tp, and tp» are transcribed. 
One of the proteins produced by leftward transcription is 
integrase (the product of the int gene), which is required 
for prophage integration into the bacterial chromosome. 
In the other direction, rightward transcription produces 
protein cll, which forms a complex with protein cI, one 
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Figure 14.22 The genome 
map of A (lambda) phage. 

(a) The A phage genome is 
organized into operons that 
function at defined times during 
infection of a host cell. (b) The 
cohesive (cos) site is the region 
that enables the linear phage 
chromosome to circularize when 
it enters the host bacterial cell. 
Immediate early, delayed early, 
and late genes are expressed 

in order. 


of the products of leftward transcription ©. Together, the 
cll/cIll complex binds to the promoter Pr (for repres- 
sor establishment). This promoter initiates leftward tran- 
scription of the cI gene, producing the cI protein, which 
is also known as the A repressor protein (Foundation 


Figure 14.23, @ and@). 
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Before the lytic cycle or the lysogenic cycle of infec- 
tion can begin, two critical molecular “decisions” have to 
be made. The first of these decisions involves determining 
whether bacteria are actively growing. With active bacte- 
rial growth, lysis is favored because new progeny phage will 
readily find new host cells. If bacteria are growing poorly, 


N is produced by transcription from P,. 


@ Transcription from Ph produces cro 


P, clll ti NP, O, cl Pam) Ors Orz Or (PR cro tp, Pre cll O 


mRNA Ea 
N protein acts as an 
antiterminator to extend | 
transcription beyond Q 
termination sequences t, N ‘ 
protein 
tr, and tp,. 


cill t, INP, O, cl Pay Ons Orz Or: Macro te, ‘Pre cll O P A 


mRNA r "m 
d 


clll protein cll protein 


Accumulation of cll/clll 
complex leads to 
lysogenic cycle. 
Lysogenic cycle development 


cll/clll protein 


-= 


P, clll t. IN P O, cl JBR Ors Or Opi Pil cro te, Pre cll O P te, Q 
cll/clll binding to P, cll/clll binding to Pre leads to 
leads to expression expression of cl, the À 
of integrase that can repressor protein. 
stimulate prophage 
integration. 


: À repressor 


protein Lysogenic cycle if 


A repressor binds 
to Op; and Op» 


P, clll t NIBR O, cl JPR Ors Osz Or: Pp cro te, [Pre cll OP te, Q 


<_ 


Transcription occurs from Pry to transcribe CI, and 
transcription from P} is blocked. The lysogenic cycle is 
established. 


— 


! 


mRNA Ga 


O © Accumulation of 


. cro protein 
cro protein 


cro and À repressor undertake 
competitive binding for 
operators Og), Op, and Ops. 


P, clll t, INBH O. cl PR Ors Osz Or: (Pp Cro te, [Phe cll O P te, Q 


Lytic cycle if 
cro binds to Op; 


P, clll t INBH O, cl Pim Or Osz Or: (Pp Cro te, [Phe cll O P th, Q 
r= = 


© Transcription continues from P, and P} and delayed early 
and late gene transcription leads to the lytic cycle. 
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however, lysogeny is favored. In this state, the prophage 
can remain quiescent until growth conditions improve. 

The protein clI is critical to this first molecular 
decision. Protein cll is sensitive to bacterial proteases, 
enzymes that degrade proteins. Proteases are in abun- 
dance when bacterial growth conditions are favorable, but 
they are sparse under starvation conditions. If bacteria 
are actively growing in good conditions, cll is degraded, 
it never forms a complex with cIII, and little À repres- 
sor protein is produced. If, on the other hand, bacterial 
growth conditions are poor, cII remains, it forms a com- 
plex with cIII, and A repressor protein is produced. 

The second molecular decision to be made involves 
direct competition between the cro protein and the A re- 
pressor protein. They compete for binding to operator 
sites, with the winning molecule determining whether 
the lytic cycle or the lysogenic cycle is established. In the 
following discussion, we focus on the competitive binding 
between A repressor protein and cro protein. 


Cro Protein and the Lytic Cycle 


Entry into the lytic cycle requires the transcription of 
late genes that are regulated by late promoters and late 
operators. These genes are rightward of Pp, and are 
involved in the synthesis of head and tail proteins, as well 
as products that lyse the host cell. The genetic switch gov- 
erning whether A phage enters the lytic or the lysogenic 
cycle hinges on the binding of cro protein and A repressor 
protein, respectively. Both cro protein and A repressor 
protein have affinity for operator sequences Or), Oro, and 
Op3, located between Pr and Pry. The two proteins have 
opposite binding affinities. The cro protein binds Ogg with 
highest affinity but has lower affinity for Og) and Og). 
The A repressor, on the other hand, has highest affinity 
for Op;. Its affinity for Ogg is not as high, and its affinity 
for Op3 is much lower. The three operator sequences each 
have a 17-bp target for binding of either cro protein or 
A repressor protein. The Op; sequence lies fully within Pp, 
and Op; lies fully within Pry; Opg is split between the two 
promoters (Figure 14.24a). 

The cro protein product is a 66-amino acid monomer 
that forms a globular structure. Functional cro protein is a 
homodimer that precisely spans the 17 bp of DNA that are 
its target binding sequence on the operators. Dimerized 
cro protein has strong binding affinity for Og3 and Ops, 
but lower affinity for Og). As cro protein concentration 
increases, however, it binds, in order, to Og3, Opo, and Op). 

The presence of cro protein at the operator sequences 
blocks the access of RNA polymerase to Pry exerting 
negative control of cI gene transcription and preventing 
production of À repressor protein (Figure 14.24b). This ac- 
tion is analogous to the effect of the /ac repressor protein 
binding to the operator sequence in the lac operon. At 
the same time, cro protein binding exerts positive control 
on Pp, leading to enhanced transcription of cro and other 
genes that are rightward of Pr. Among these rightward 
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Figure 14.24 Transcription of à phage genes cro and cl. 

(a) Promoters Pg and Pey overlap three operator sites—Op,, Op, 
and Og;—that are competitively bound by regulatory proteins. 
(b) The cro gene is transcribed from Pp. Cro protein binds Op3 
and Opz, leading to transcription of genes that generate the 

lytic cycle. (c) The cl gene is transcribed from Pry to produce A 
repressor that binds to Op; and drives additional c/ transcription. 
Other gene transcription is blocked, and lysogeny is established. 


genes is Q, a gene producing Q protein, which is a positive 
regulator of transcription of late genes that are rightward 
of the late promoter Pp. These late genes include genes 
encoding proteins of the phage head and tail as well as 
genes required for lysis of the host cell. 


The A Repressor Protein and Lysogeny 


Successful binding by A repressor protein at operator 
sites Og; and Op; is cooperative. This binding is a positive 
regulator of transcription from the promoter Pry. The 
effect is much like binding of the CAP—cAMP complex in 
the lac operon (Figure 14.24c). 

Under the influence of A repressor protein binding 
to the operator region, transcription from Pry produces 
more repressor protein. Repressor binding also prevents 
transcription from Pp, effectively blocking cro transcrip- 
tion, and lysogeny results. 


Resumption of the Lytic Cycle Following 
Lysogeny Induction 


The A repressor protein is the product of the cI gene. 
This protein is a 236—amino acid polypeptide contain- 
ing 92 amino acids in the C-terminal domain (amino 
acids 1-92), 105 amino acids in the N-terminal domain 
(amino acids 132-236), and the remaining 39 amino acids 
(93-131) linking the two domains. Functional A repres- 
sor protein is dimeric, and monomers are linked at their 
C-terminal ends. The resulting dimers have a dimension 
that spans 17 bp of DNA, precisely the size of each opera- 
tor sequence (Figure 14.25a). 

Lysogeny is a semipermanent state that can be main- 
tained for an extended period of time by the ongoing 
binding of A repressor protein to Oj, Or, and Op3. The 
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Figure 14.25 Lysogeny maintenance and termination. 

(a) A homodimeric A repressor protein binds to 17-bp operator 
sequences to regulate its own transcription and maintain lysog- 
eny. (b) UV light and other DNA-damaging agents activate RecA, 
which cleaves à repressor monomers to inactivate repressor pro- 
tein. (c) Lysogeny ends with the removal of \ repressor protein 
from operator sequences and the initiation of transcription of cro. 


CASE STUDY 


Case Study 497 


persistence over long periods of the lysogenic state raises 
two questions. First, what makes lysogeny come to an end, 
and second, how does the phage resume the lytic cycle 
and produce progeny phage? 

Induction is the process that brings lysogeny to an 
end and reinitiates the lytic cycle by excising the prophage 
from its integrated location in the bacterial chromosome. 
You might think of induction as another molecular deci- 
sion, this one triggered by DNA damage done by extracel- 
lular forces. The principal force causing injury to DNA is 
ultraviolet light, whose effects on DNA we described in 
Section 12.4. UV-induced DNA damage activates many 
proteins involved in DNA repair. Among the numerous 
proteins activated in the DNA repair cascade is the pro- 
tein RecA, whose role in mutation repair is to activate 
recombination. 

When bacterial DNA is damaged by UV light, how- 
ever, the protease (protein-destroying) activity of RecA 
protein is also activated. Among other targets of this he 
protease activity is the amino acid segment of A repres- 
sor monomers that join the N- and C-terminal regions of 
each protein (Figure 14.25b). The C terminus is clipped 
off each monomer, effectively breaking apart repressor 
dimers. This causes the N-terminal ends to fall off DNA. 
With A repressor no longer bound to DNA, the Op), Op», 
and Op3 sequences are exposed, and positive regulation 
of cI transcription ends, as does the negative regulation 
of cro transcription. A consequence of the removal of A 
repressor from the operator region is the renewed produc- 
tion of cro protein (Figure 14.25c). The cro protein binds 
to the operators no longer occupied by repressor protein. 
This leads to the expression of Xis, producing the enzyme 
excisionase that removes the lysogen from its integrated 
location. This event triggers the resumption of the lytic 
cycle and ultimately results in host cell lysis and the release 
of progeny phage. 

In summary, A phage is an elegant regulatory system 
that facilitates two molecular decisions controlling whether 
a genetic switch is flipped in favor of the lytic cycle or the 
lysogenic cycle. The crucial interaction is between the 
protein products of the early genes cro and c/ that compete 
for binding to operator sequences Og; Oo, and Op3. If cro 
protein prevails by successfully binding to Orgy and Op; 
expression of c/ is repressed, and the synthesis of late genes 
leading to completion of the lytic cycle is assured. On the 
other hand, if A repressor protein prevails, its early occupa- 
tion of Or; and Op» prevents transcription of late genes, 
ensuring that the lysogenic cycle will proceed. 


Vibrio cholerae—Stress Response Leads to Serious Infection 


THE INFECTIOUS DISEASE CHOLERA Cholera is a severely 
debilitating and potentially fatal disease caused by infection 
with the intestinal bacterium Vibrio cholerae. It is a major 


public health problem in developing countries where sanita- 
tion and supplies of clean water are inadequate or following 
disasters that disrupt normal sanitation and supplies of clean 
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water. The bacterium is transmitted from person to person 
through contact with infected fecal material. The ingestion 
of fecal-contaminated water is the most common way of 
contracting cholera. Many ingested bacteria are killed by the 
highly acidic environment of the stomach, but V. cholerae in 
particular can survive in greater numbers than most bacteria 
by undertaking a rapid switch in gene regulation that shuts 
down the expression of some genes and activates the ex- 
pression of stress response genes. Unfortunately for infected 
humans, the V. cholerae stress response produces toxins that 
can rapidly lead to degradation of the mucosal cells lining 
the intestines and to excessive leakage of water from the 
damaged cells. The leakage disturbs the osmotic balance of 
the cells; to compensate, they secrete water, initiating a re- 
peating cycle of ion leakage and water release that produces 
watery diarrhea and severe dehydration. Unless immediate 
antibiotic treatment and rehydration therapy are started, 
death can occur within hours. 


VIBRIO CHOLERAE TOXINS In V. cholerae, three genes— 
ToxS, ToxR, and ToxT—exert positive control over the transcrip- 
tion of genes producing virulence (active bacterial growth that 
causes disease). The expression of ToxS and ToxR genes is stim- 
ulated by the environmental cues encountered by V. cholerae 
in the hostile environment of the stomach. A protein complex 


SUMMARY ( MasteringGenetics™ 


14.1 Transcriptional Control of Gene Expression 
Requires DNA-Protein Interaction 


Regulated genes are under transcriptional control, whereas 
constitutive genes are not regulated. 


In negative control of transcription, regulatory proteins 
bound to DNA reduce or eliminate transcription. 


Regulatory proteins, also called repressors, have a 
DNA-binding domain to bind regulatory DNA 

sequences and an allosteric domain to bind a regulatory 
molecule. 

An inducer molecule binds to the repressor molecule at an 
allosteric site to inhibit its action. 

In positive regulatory control, activator proteins bind DNA 
at promoters and other regulatory sequences and initiate or 
increase transcriptional efficiency. 


14.2 The lac Operon Is an Inducible Operon 
System under Negative and Positive Control 


Bacterial operons transcribe two or more genes under 
the coordinated regulatory control of shared promoters, 
operators, and other regulatory elements. 


The lactose (/ac) operon is an inducible operon system that 
produces three proteins—f-galactosidase (lacZ), perme- 
ase (JacY), and transacetylase (/acA) that are required to 
metabolize lactose and its by-products. Its regulatory 
control center contains a promoter and an operator 
sequence (/acO). 
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formed by the products of these genes activates transcription 
of ToxT. The polypeptide product of ToxT is a transcription- 
activating protein that binds to the promoter P,,, that con- 
trols transcription of two genes, CtxA and CtxB (abbreviations 
for “cholera toxin A” and “cholera toxin B”) that are part of an 
operon. The polypeptide products of CtxA and CtxB are the 
cholera toxins that initiate the series of actions that lead to 
cholera symptoms. 


PREVENTING AND STUDYING THE DISEASE PROCESS 
Preventing cholera is an obvious public health priority. Ac- 
cording to the World Health Organization, between 3 million 
and 5 million people contract cholera each year, and more 
than 100,000 deaths are attributed to cholera annually. 
Vaccines can help prevent some cholera cases, and oral anti- 
biotics can help treat the disease once it has been acquired. 
Important as well is gaining understanding of how the ToxS- 
ToxR complex and ToxT operate in promoter recognition, and 
identifying the other genes they regulate. Similarly, gathering 
information about the stress response and virulence genes in 
V. cholerae will help medical practitioners and microbiologists 
understand how the bacterium produces its lethal effects. 
Such knowledge may suggest new strategies that can disable 
the bacterium before it causes disease or new treatments that 
can prevent the most serious consequences of infection. 


For activities, animations, and review quizzes, go to the Study Area. 


Negative control of lac operon gene transcription is exerted 
by a repressor protein (Jacl) that binds to the /acO region 

to block transcription. Allolactose inactivates the repressor 
protein by changing its conformation and preventing it from 
binding to the operator. 

Positive control of transcription of lac operon genes is 
exerted by the CAP—cAMP complex that forms in the 
absence of glucose and binds to the CAP site of the lac 
promoter. 


14.3 Mutational Analysis Deciphers Genetic 
Regulation of the lac Operon 


Mutation studies determined the order of lac operon genes 
as lacZ-lacY-lacA. 


The analysis of mutant haploid and partial diploid bacteria 
identified the trans-acting repressor protein that binds the 
operator sequence. 

lac operator mutation analysis indicates that the operator is 
a cis-acting element that controls transcription of immedi- 
ately adjacent genes on the chromosome. 

The lac repressor binding site overlaps the RNA polymerase 
binding location in the lac promoter. 


lac repressor protein binding induces DNA loop formation 
that prevents RNA polymerase binding at the promoter. 
The CAP—cAMP complex binds to the CAP binding site 
of the lac promoter and facilitates RNA polymerase 
binding. 


14.4 Transcription from the Tryptophan Operon 
Is Repressible and Attenuated 


The tryptophan (trp) operon is a repressible operon that pro- 


duces five polypeptides that participate in tryptophan synthesis. 


trp operon transcription is inhibited by a feedback 
mechanism involving tryptophan as a corepressor. 

trp operon gene expression is attenuated to maintain the cel- 
lular concentration of tryptophan at a steady state. Many of the 


amino acid operons are regulated by an attenuation mechanism. 


The trpL (leader) region contains an attenuator sequence of 
four DNA repeats that form one of two alternative mRNA 
stem loops. 

The 2-3 (antitermination) stem loop formed by mRNA 
permits transcription of five trp operon structural genes ina 
polycistronic mRNA. 

The 3—4 (termination) stem loop of mRNA terminates 
transcription before RNA polymerase binds to the structural 
genes of the operon. 


14.5 Bacteria Regulate the Transcription of Stress 
Response Genes and Translation and Archaea 
Regulate Transcription in a Bacteria-like Manner 


J 


Alternative sigma factors are used to generate RNA poly- 
merases that recognize promoters of genes not transcribed 
by the common bacterial RNA polymerase. 
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Genes transcribed using alternative sigma factors are 
required only under specialized circumstances, such as in 
response to heat shock. 

The translation of bacterial mRNA can be blocked by 
RNA-binding translation repressor proteins or by antisense 
RNA that binds to mRNA from specific genes. 


E Many archaeal genes are organized into operons, 
and several transcription repressor and transcription 
activator proteins controlling these operons have been 
identified. 


14.6 Antiterminators and Repressors Control 
Lambda Phage Infection of E. coli 


| Early genes of the bacteriophage A genome produce 
proteins that compete to bind at the same regulatory 
region. The protein that prevails determines whether the 
phage infection will follow the lytic cycle or the lysogenic 
cycle. 
Completion of the lytic cycle requires the expression of late 
A phage genes. 
Lysogen integration and maintenance requires ongoing 
expression of the A repressor protein, which regulates its 
own transcription. 
Lysogen integration is reversed by environmental 
changes that lead to induction and to resumption of the 
lytic cycle. 
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1. Bacterial genomes frequently contain groups of genes 
organized into operons. What is the biological advan- 
tage of operons to bacteria? Identify the regulatory 
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repressible operon (p. 484) 

repressor protein (p. 470) 

stem loop [3-4 (termination stem loop), 
2-3 (antitermination stem loop)] 
(p. 485) 

trans-acting (p. 479) 

translation repressor protein (p. 491) 
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arranged? 


components you would expect to find in an operon. 
How are the expressed genes of an operon usually 
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Transcriptional regulation of operon gene expression in- 
volves the interaction of molecules with one another and of 
regulatory molecules with segments of DNA. In this con- 
text, define and give an example of each of the following: 
operator 

repressor 

inducer 

corepressor 

promoter 

positive regulation 

allostery 

negative regulation 

attenuation 


rorg m op ao op 


Why is it essential that bacterial cells be able to regulate 
the expression of their genes? What are the energetic and 
evolutionary advantages of regulated gene expression? 

Is the expression of all bacterial genes subject to regu- 
lated expression? Compare and contrast the difference 
between regulated gene expression and constitutive gene 
expression. 


Identify similarities and differences between an inducible 

operon and a repressible operon in terms of 

a. the transcription-regulating DNA sequences. 

b. the presence and action of allosteric regulatory 
molecules. 

c. the organization of structural genes of the operon. 


The transcription of §-galactosidase and permease is 
inducible in /ac* bacteria with a wild-type lac operon. 
Explain the mechanism by which lactose gains access to 
the cell to induce transcription of the genes. 


Is attenuation the product of an allosteric effect? Is attenua- 
tion the result of a transcriptional or a translational activity? 
Explain your answers. 


Application and Integration 


15. Attenuation of trp operon transcription is controlled by 


the formation of stem-loop structures in mRNA. The at- 
tenuation function can be disrupted by mutations that alter 
the sequence of repeat DNA regions 1 to 4 and prevent the 
formation of mRNA stem loops. Describe the likely effects 
on attenuation of each of the following mutations under 
the conditions specified. 


Mutated Region Tryptophan Level 


a. Region 1 Low 
b. Region 1 High 
c. Region 2 Low 
d. Region 2 High 
= æ. Region 3 Low 
f Region 3 High 
g. Region 4 Low l 
h. Region 4 High 


10. 


11. 


12. 


13. 


14. 


The trpL region contains four repeated DNA sequences that 
lead to the formation of stem-loop structures in mRNA. 
What are these stem-loop structures, and how do they affect 
transcription of the structural genes of the trp operon? 


The CAP binding site in the lac promoter is the location 
of positive regulation of gene expression for the operon. 
Identify what binds at this site to produce positive regula- 
tion, under what circumstances binding occurs, and how 
binding exerts a positive effect. 


What role does cAMP play in transcription of lac operon 
genes? What role does CAP play in transcription of lac 
operon genes? 


How would a cap mutation that produces an inactive CAP 
protein affect transcriptional control of the lac operon? 


Explain the circumstances under which attenuation of 
operon gene expression is advantageous to a bacterial 
organism. Would you expect attenuation to be found in a 
single-celled eukaryote? In a multicelled eukaryote? 


Consider the transcription of genes of the lac operon 
under two conditions: (1) when both glucose and lactose 
are present and (2) when glucose is absent and lactose is 
present. Describe the comparative levels of transcription of 
lac operon genes under these conditions, and explain the 
molecular basis for the difference. 


Describe the lytic and lysogenic life cycles of  bacterio- 

phage. What roles do A repressor and cro protein play in 
controlling transcription from Pp and Pry, and how are 

these roles linked to lysis and lysogeny? 


Define antisense RNA, and describe how it affects the trans- 
lation of acomplementary mRNA. Why is it more advanta- 
geous to the organism to stop translation initiation than to 
inactivate or destroy the gene product after it is produced? 


For answers to selected even-numbered problems, see Appendix: Answers. 


16. 


17. 


In the lac operon, what are the likely effects on operon 

gene transcription of the mutations identified below? 

a. Mutation of consensus sequence in the lac promoter 

b. Mutation of the repressor binding site on the operator 
sequence 

c. Mutation of the Jacl gene affecting the allosteric site of 
the protein 

d. Mutation of the Jacl gene affecting the DNA-binding 
site of the protein 

e. Mutation of the CAP binding site of the lac promoter 


Identify which of the following lac operon haploid geno- 
types transcribe operon genes inducibly and which tran- 
scribe genes constitutively. Indicate whether the strain is 
lac* (able to grow on lactose-only medium) or lac’ (cannot 
grow on lactose medium). 

s POZY 

b. I'PtOC ZY+ 

c. I P OtZ*Y+* 

d. I P ony 


fS] 


18. Complete the following table, indicating whether function- 


ala 


19. 


20. 


21. 


epo zy 
E PP OCZY 
g. I Pt Oo “ty 


ally active B-galactosidase and permease are produced in 


Genotype B-Galactosidase 
Lactose 
Example: /* P Or Z* yt + 


a rP Olea Vari Pa OZY 


b. TP+ Ot ZT tit Pt OCZ* Y7 
SP RE OR VAPE OZ YS 
. M ptoCz+Y*/I+ P otzt yt 
eaP O ZAV PHO ZY 
£ tptotz-ytSptotzty- 
g RRT Y O TA 


List possible genotypes for lac operon haploids that have 

the following phenotypic characteristics: 

a. The operon genes are constitutively transcribed, 
but the strain is unable to grow on a lactose 
medium. List two possible genotypes for this 
phenotype. 

b. The operon genes are never transcribed above a 
basal level, and the strain is unable to grow on 
a lactose medium. List two possible genotypes for 
this phenotype. 

c. The operon genes are inducibly transcribed, but the 
strain is unable to grow on a lactose medium. List one 
possible genotype for this phenotype. 

d. The operon genes are constitutively transcribed, and 
the strain grows on lactose medium. List two possible 
genotypes for this phenotype. 


Suppose each of the genotypes you listed in parts (a) and 

(b) in Problem 19 are placed in a partial diploid genotype 

along with a chromosome that has a fully wild-type lac 

operon. 

a. Will the transcription of operon genes in each partial 
diploid be inducible or constitutive? 

b. Which partial diploids will be able to grow on a lactose 
medium? 


Four independent lac mutants (mutants A to D) are 
isolated in haploid strains of E. coli. The strains have the 
following phenotypic characteristics: 


Mutant A is Jac , but transcription of operon genes is 
induced by lactose. 


Mutant B is Jac’ and has uninducible transcription of 
operon genes. 
Mutant C is lac* and has constitutive transcription of 
operon genes. 


Mutant D is Jac* and has constitutive transcription 
of operon genes. 


No Lactose 


22. 


23. 
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the presence and absence of lactose. Use “+” to indicate 
the presence of a functional enzyme and “~” to indicate 

its absence. Indicate whether the partial diploid strain is 
lac* (able to grow on lactose-only medium) or lac’ (cannot 
grow on lactose medium). 


Permease Phenotype 
Lactose No Lactose 
+ = lag 


A microbiologist develops donor and recipient varieties 
of each mutant strain and crosses them with the results 
shown below. The table indicates whether inducible, 
constitutive, or noninducible transcription occurs, along 
with lac* and lac’ growth habit for each partial diploid. 
Assume each strain has a single mutation. 


Mating Transcription and Growth 
AXB lac” 

NG lact, inducible 

AXD -Jac*, constitutive 

Bea lac*, inducible 

BXD lact, constitutive 

C-D lac, constitutive 


Use this information to identify which lac operon gene is 
mutated in each strain. 


Suppose the lac operon partial diploid cap" I* P* O* Z~ Y*/ 

cap’ I” P* O* Z* Y` is grown. 

a. Will this partial diploid strain grow on a lactose medium? 

b. Is transcription of B-galactosidase and permease 
inducible, constitutive, or noninducible? 

c. Explain how genetic complementation contributes to 
the growth habit of this strain. 


A bacterial inducible operon, similar to the /ac operon, 
contains three genes—R, T, and S—that are involved in 
coordinated regulation of transcription. One of these genes 
is an operator region, one is a regulatory protein, and the 
third produces a structural enzyme. In the table below, 

“+” indicates that the structural enzyme is synthesized and 
“—” indicates that it is not produced. Use the information 
provided to determine which gene is the operator, which 
produces the regulatory protein, and which produces 

the enzyme. 
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Genotype Enzyme Synthesis 
Inducer Present Inducer Absent 
Re Stes a. _ 
FET = = 
RS i $ 
RES aE d 


ESTUS T + + 
ES U S U + + 
R'S*T Y/R S T + — 


24. A repressible operon system, like the trp operon, contains 
three genes, G, Z, and W. Operon genes are synthesized 
when the end product of the operon synthesis pathway is 
absent, but there is no synthesis when the end product is 
present. One of these genes is an operator, one is a regula- 
tory protein, and the other is a structural enzyme involved 
in synthesis of the end product. In the table below, “+” 
indicates that the enzyme is synthesized by the operon, 
and “—” means that no enzyme synthesis occurs. Use this 

information to determine which gene corresponds to each 

operon function. 


Genotype Enzyme Synthesis 
End Product End Product 
Present Absent 
Gtztw* = + 
G ztwt ce + 
Gz wt — — 
GZW + + 


G Z} W/G} Z Ww + + 
GZ WG Zt Ww + + 
GZ W/G} z+ wt — + 
G} Z W/G Z Wt — + 


25. What is the likely effect of each of the following mutations 
of the trpL region on attenuation control of trp operon 
gene transcription? Explain your reasoning. 


a. Region 3 is deleted. 

b. Region 4 is deleted. 

c. The entire trpL region is deleted. 

d. The start (AUG) codon of the trpL polypeptide is 
deleted. 

e. Two nucleotides are inserted into the trpL region 
immediately after the polypeptide stop codon. 

f. Twenty nucleotides are inserted into the trpL region 
immediately after the polypeptide stop codon. 

g. Ten nucleotides are inserted between regions 2 and 
3 of trpL. 

h. Two nucleotides are inserted immediately following the 
polypeptide start codon. 

i. The entire polypeptide coding sequence of trpL is 
deleted. 

j. The eight uracil nucleotides immediately following 
region 4 are deleted. 


26. 


27. 


28. 


29. 


30. 
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Suppose that base substitution mutations sufficient to 
eliminate the function of the operator regions listed below 
were to occur. For each case, describe how transcription or 
life cycle would be affected. 


a. lacO mutation in E. coli 
b. Op; mutation in A phage 
c. Op3 mutation in A phage 


Two different mutations affect Prg. Mutant 1 decreases 
transcription from the promoter to 10% of normal. Mutant 
2 increases transcription from the promoter to tenfold 
greater than the wild type. How will each mutation affect 
the determination of the lytic or lysogenic life cycle in 
mutant A phage strains? Explain your answers. 


How would mutations that inactivate each of the following 
genes affect the determination of the lytic or lysogenic life 
cycle in mutated A phage strains? Explain your answers. 

cl 

cll 

cro 

int 

cll and cro 

N 


moaogre 


The bacterial insertion sequence [S10 uses antisense RNA 
to regulate translation of the mRNA that produces the en- 
zyme transposase, which is required for insertion sequence 
transposition. Transcription of the antisense RNA gene is 
controlled by Pour, which is over 10 times more efficient 
at transcription than the Pyy promoter that controls trans- 
posase gene transcription. 


a. Ifa mutation reduced the transcriptional efficiency of 
Pour so as to be equal to that of Px, what is the likely 
effect on the transposition of [S10? 

b. Ifa mutation of Pyy eliminates its ability to function in 
transcription, what is the likely effect on the transposi- 
tion of JS10? 


Northern blot analysis is performed on cellular mRNA 
isolated from E. coli. The probe used in the northern blot 
analysis hybridizes to a portion of the /acY sequence. Below 
is an example of the autoradiograph from northern blot 
analysis for a wild-type lac™ bacterial strain. In this gel, 

lane 1 is from bacteria grown in a medium containing only 
glucose (minimal medium). Lane 2 is from bacteria in a 
medium containing only lactose. Following the style of this 
diagram, draw the autoradiograph appearance for northern 
blots of the bacteria listed below. In each case, lane 1 is for 
mRNA isolated after growth in a glucose-containing (mini- 
mal) medium, and lane 2 is for mRNA isolated after growth 
in a lactose-only medium. 


Lane 


Autoradiograph 
of northern blot 


lac’ bacteria with the genotype I* P* OC Z* Y* 

lac bacteria with the genotype I* Pt OT Z7 Y7 

lac’ bacteria with the genotype I+ P7 OC Z* Yt 

lac* bacteria with the genotype J” P* Of Z* Yt 

lac bacteria with the genotype I* P* OT Z7 Y* that has 
a polar mutation affecting the lacZ gene 

lac bacteria with the genotype I* P* OC Z7 Y~ 

g. lac bacteria with the genotype I~ P* O* Z* Y* and 

a mutation that prevents CAP—cAMP binding to the 
CAP site 


enor Pp 


m 


31. The electrophoresis gel shown below in part (a) is from 
a DNase I footprint analysis of an operon transcription 
control region. DNA sequence analysis of a 35-bp region 
is shown in part (b). The control region, labeled with ®?P 
at one end, is shown in a map in part (c). Separate samples 
of control-region DNA are exposed to DNase I, and the 
resulting DNase I-digested DNA is run in separate lanes 
of the electrophoresis gel. Unprotected DNA is in lane 1, 
DNA protected by repressor protein is in lane 2, and RNA 
polymerase-protected DNA is in lane 3. The numbers 
along the electrophoresis gel correspond to the 35-bp 
sequence labeled on the map in part (c). Use the informa- 
tion provided to solve the following problems. 
a. Determine the DNA sequence of the 35-bp region 
examined. 
b. Locate the regions of the sequence protected by repres- 
sor protein and by RNA polymerase. 


(a) Phase | treatment (b) DNA sequencing 


cS v 
S E S 
Ò Š y Qi 
Q L > 
Ox O¢ 3 G A T c 
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SO SO a — 
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2050 = 20-4 a 

o= oOo— 

me —_— 
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—— = —_— 
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(c) 
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32. For the following lac operon partial diploids, determine 
whether the synthesis of lacZ mRNA is “constitutive,” 
“inducible,” or “uninducible,” and indicate whether the 
merodiploid is lact or lac (able or not able to utilize 
lactose). 


lacZ mRNA lac 


Genotype Synthesis Phenotype 


Anni ROTZ Vali PORZ Va 
b I PROS ZAY I PROZ Y 
PR OZY P On ZV 
d. ptotz-ytytpotzty* 
eR ORZ TY II Pt OtZt Yy 


33. The following hypothetical genotypes have genes A, B, 
and C corresponding to /acl, lacO, and lacZ, but not nec- 
essarily in that order. Data in the table indicate whether 
B-galactosidase is produced in the presence and absence 
of the inducer for each genotype. Use this data to identify 
the correspondence between A, B, and C and the Jacl, 
lacO, and lacZ genes. Carefully explain your reasoning for 


identifying each gene. 
Genotype B-Galactosidase Production 
Inducer Present Inducer Absent 
ARB CG + + 
2 AB GC + + 


a Bin Gas F + 
AA Bil Gua Aw Bil Gas + = 


34. Foran E. coli strain with the lac operon genotype 
I*P*0*Z*Y*, identify the level of transcription of the 
operon genes in each growth medium listed. Specify 
transcription as “none,” “basal,” or “activated” for each 
medium, and provide an explanation to justify your 
answer. 

a. Growth medium contains lactose and glucose. 
b. Growth medium contains glucose but no lactose. 
c. Growth medium contains lactose but no glucose. 


15 


CHAPTER OUTLINE 


15.1 


15.2 


15.3 


Cis-Acting Regulatory 
Sequences Bind Trans-Acting 
Regulatory Proteins to Control 
Eukaryotic Transcription 
Chromatin Remodeling 

and Modification Regulates 
Eukaryotic Transcription 
RNA-Mediated Mechanisms 
Control Gene Expression 


Regulation of Gene 
Expression in Eukaryotes 


Wild-type petunia flowers have solid color due to expression of a 

chromosomal pigment gene. Transgenic petunias with an extra copy of 

the pigment gene have colorless (white) regions due to cosuppression, 
a process in which regulatory RNAs inactivate both the chromosomal 
copy and the transgenic copy of the pigment gene. 


f the 46 chromosomes in a single nucleus from any cell 
in your body were stripped of their associated proteins 
and laid end to end, they would span almost 2 meters. Yet 
in their normal compacted state, these chromosomes can 
fit inside a nucleus that is about 5 microns (5 millionths of a 
meter) in diameter and still leave room for DNA replication, 
transcription, pre-mRNA processing, and numerous other 
activities to take place. This efficient packaging and access 
to DNA are made possible by the chromatin structure of the 
genome and the dynamic changes of which chromatin is 
capable throughout the cell cycle. 


The genomes of eukaryotic organisms—yours 
included—are considerably larger on average than 
those of bacterial and archaeal species, and they are 
packaged much differently as well. One major pack- 
aging difference is the localization of chromosomes 
in a nucleus in eukaryotic cells. Nuclear localization 
sequesters the chromosomes and encapsulates 
DNA replication, transcription, and the various 
RNA-processing activities. A second difference is 
the incorporation of DNA into chromatin. 

The process of chromatin condensation initiates 
at the beginning of prophase and culminates in fully 
condensed chromosomes in metaphase. This is an 
essential predecessor of efficient chromosome sepa- 
ration in anaphase. Chromatin condensation also 
plays a pivotal role in permitting or blocking tran- 
scription. No cell in your body expresses all 22,000 
or so genes of the human genome. Instead, most 
human cell types express only a few thousand genes, 
while the other genes are transcriptionally silent. In 
recent decades, cell biologists studying the close 


© Transcriptional regulation 
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connection between structural changes in chroma- 
tin and the transcription of eukaryotic genes have 
succeeded in uncovering many crucial details. 

The processes that regulate gene expression in 
eukaryotes (see Chapters 8 and 9) are more varied 
and multifaceted than those governing gene expres- 
sion in bacterial genomes (Figure 15.1). In the present 
chapter, we focus on elements that do not occur in 
prokaryotes and yet are central to the regulation of 
transcription and gene expression in eukaryotes: 

(1) the organization of regulatory sequences other 
than promoters that contribute to the regula- 

tion of transcription; (2) mechanisms that remodel 
chromatin or reconfigure the association between 
nucleosomes and DNA to regulate transcription; 

(3) epigenetic mechanisms that exert transcriptional 
regulatory control in cell lineages over the course 
of an organism's development; (4) the transmission 
of epigenetic states from one generation of cells to 
another to exercise long-term control of differential 
gene expression; and (5) RNA-based mechanisms 


[© mRNA processing 


a. Regulatory proteins and j Nudlar a. Capping of the 5'end, 
transcription factors bind to Ma NOVOWOWVOWE polyadenylation of the 3’ end, 
consensus DNA sequences E and intron splicing modify 
(promoter regions) to F pre-mRNA. 


facilitate transcription. 
b. Additional regulatory DNA | 


Bre- 


sequences (enhancers and 
silencers) bind regulatory 
proteins to facilitate 
transcription of specific genes i 


in each cell type. BNA 


c. Open chromatin structure is 
favorable for transcription 


formed by protein action. À 
Cytoplasm 

d. Alternative promoters are 
utilized in different cell types 
to produce different 
pre-mRNA molecules. 

e. Methylation of DNA inhibits 
transcription. 

5 Polypeptide 

© Post-translation à 


a. Polypeptides are processed 
and modified in the Golgi 
body before transportation 


out of cell. = 
protein & 7 


b. Regulatory molecules bind to 
a polypeptide to alter its 
function. 


c. Protein stability is regulated. 


Figure 15.1 


mRNA S\N 


b. Alternative capping and 
polyadenylation sites can be 
used in different cell types. 


| Cap | 
-© Mature 


/A/NAAA 


Poly(A)-tail c. Alternative splicing produces 
| different mature mRNA 
AAA molecules from some cell types. 


d. RNA editing modifies the base 
L sequences of mRNA. 


© Regulation of mature mRNA 


a. Translational regulatory proteins 
bind mature mRNA to delay 
translation initiation. 


b. Small RNAs regulate the stability 
or translation of mRNA. 


c. Transport of mature mRNA to 
cytoplasm is regulated. 


d. RNA stability is regulated. 


© Translation 


Masking of mRNA delays or 
prevents translation. 


An overview of gene regulation mechanisms in eukaryotes. 
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operating post-transcriptionally to regulate the avail- 
ability of mature mRNA for translation and therefore 
the ability to produce polypeptides. 


15.1 Cis-Acting Regulatory Sequences 
Bind Trans-Acting Regulatory Proteins 
to Control Eukaryotic Transcription 


Despite the considerable differences between eukaryotes 
and bacteria, the basic mechanisms controlling transcrip- 
tion are broadly similar in both groups of organisms. The 
DNA-protein interactions in eukaryotes follow a scheme 
familiar from bacterial processes. Activator proteins bind 
regulatory sequences to stimulate transcription (positive 
regulation of transcription), and repressor proteins bind 
other regulatory sequences to hinder transcription (nega- 
tive regulation of transcription). Unlike their counterparts 
in bacteria, however, eukaryotic transcription activators 
and repressors, collectively known as transcription fac- 
tors, are often found in large complexes composed of a 
large number of distinct regulatory proteins that bind a 
wide and diverse array of regulatory sequences. These 
proteins aggregate in diverse combinations that activate 
or repress transcription of different patterns of genes in 
different tissues and at different times in the life cycle. 

The complexity of gene regulation is reflected both 
in the numbers of different transcription factors and the 
diversity of the target genes they regulate. For example, 
the bacterium E. coli has about 270 transcription factors, 
about the same number as the single-celled eukaryote 
S. cerevisiae. In contrast, multicellular eukaryotes such 
as Drosophila, humans, and Arabidopsis have approxi- 
mately 600, 1400, and 1900 different transcription factors, 
respectively. Similarly, consider the transcription factors 
regulating the lac operon in E. coli: the cAMP—CAP com- 
plex regulates about a dozen loci in the E. coli genome, 
and the lac repressor has only a single target locus, the 
lac operon. In contrast, individual transcription factors in 
multicellular eukaryotes may regulate tens to hundreds of 
target genes. 

In multicellular eukaryotes, many genes are regulated 
in a developmental or cell-type specific manner, with 
some genes utilized multiple times in precise develop- 
mental patterns of expression. Because humans have only 
about five times as many genes as E. coli but many more 
times the number of distinct cell types, the increased 
complexity in gene regulation is considered to be respon- 
sible for the evolution and development of multicellular 
eukaryotes. Changes in gene regulation are held to be a 
significant driver in the evolution of morphological com- 
plexity. To cite a finer scale example, since the coding 
sequences of chimp and human genes are nearly identical, 


it is likely that most differences between the two species 
are due to differences in gene regulation rather than func- 
tional differences in protein products. 

Another major difference between bacteria and mul- 
ticellular eukaryotes is the precision of gene regulatory 
control. E coli, being a single-celled organism, needs to be 
able to rapidly change gene expression patterns in order 
to respond quickly to changing environmental conditions. 
Thus, even for genes that are “off,” a few transcripts are 
always present in the cell, a situation that, as we saw in 
the case of the /ac operon, enabled the sensing of the pres- 
ence of lactose. In contrast, in multicellular eukaryotes 
with hundreds to thousands of different cell types, genes 
encoding proteins that are required only in specific cell 
types need to be tightly regulated. This precise regulation, 
where genes that are “off” are absolutely transcriptionally 
silent, is mediated by the packaging of chromatin into an 
inactive state, a subject we will explore later in this chap- 
ter, after we first discuss the role of transcription factors 
in eukaryotic gene regulation. 


Transcriptional Regulatory Interactions 


Three sets of regulatory DNA sequences are commonly 
involved in eukaryotic regulation of transcription of spe- 
cific genes. The first set of regulatory sequences is the 
core promoter region containing the TATA box and other 
sequences; it is immediately adjacent to the start of tran- 
scription and is the sequence to which RNA polymerase II 
and its associated transcription factors bind (Figure 15.2). 
Upstream of the core promoter are various proximal 
elements that are a second set of regulatory sequences 
found in some genes and which are often involved in 
quantitative gene regulation. At greater distances from 
the core promoter are enhancer and silencer sequences 
(or enhancers and silencers), the third set of regulatory 
sequences, which bind regulatory proteins and interact 


Regulatory 
proteins 
Nucleosome 
| iin TBP 


Ve EBI Jee 


LH 


Enhancer l 7 
: Proximal TATA start site 
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element box 
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promoter 


Figure 15.2 Regulatory interactions in eukaryotic 
transcription. TATA-binding protein (TBP), other general 
transcription factors (GTFs), and RNA polymerase II (Pol Il) bind 
the core promoter. Other regulatory proteins bind proximal 
promoter and enhancer regions and interact with nucleosomes 
to activate transcription. 
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with proteins bound to other promoter segments, pro- 
viding both quantitative and qualitative control of gene 
expression. Unlike core promoter and proximal promoter 
elements, which are invariably located upstream of and 
close to the genes they regulate, enhancers and silencers 
can be upstream or downstream of genes they regulate as 
well as residing in introns and occasionally even within 
coding regions. Although some enhancer and silencer 
sequences are close to the genes they regulate, others 
are great distances, thousands to tens of thousands of 
nucleotides, away from the genes they regulate. All three 
of these regulatory regions contain cis-acting regulatory 
sequences, which means they regulate transcription of 
genes located on the same chromosome as the sequences. 
RNA polymerase II (pol II) and various general tran- 
scription factors (GTFs) are recruited to and bind the 
core promoter (see Section 8.3). Transcriptional activa- 
tor proteins or transcriptional repressor proteins bind to 
proximal promoter elements and to enhancers. All these 
proteins are trans-acting regulatory proteins: They are 
able to identify and bind target regulatory sequences on 
any chromosome. RNA polymerase II, for example, is 
able to bind any core promoter region if the right general 
transcription factors are also present. Similarly, transcrip- 
tion activator and repressor proteins can bind any target 
regulatory sequence and can influence transcription with 
equal efficiency no matter where the sequence occurs. 
Besides the regulatory proteins that bind regulatory 
DNA in a sequence-specific manner, many additional 
proteins also associate with regulatory regions of DNA by 
protein-protein interactions that form larger complexes. 
At enhancers, for example, aggregation of multiple pro- 
teins, a few binding enhancer sequences and the others 
binding other proteins, forms a large protein complex 
known as an enhanceosome. Enhanceosomes direct DNA 
bending into loops that bring the enhanceosome into con- 
tact with RNA polymerase and transcription factors bound 
at the core promoter and to proximal promoter elements 
(see Figure 8.12). The DNA loops can be small or large, in 
keeping with the observation that enhancers may be close 
to or quite distant from the genes they regulate. Repressor 
proteins act in a similar manner, with some proteins bind- 
ing DNA in a sequence-specific manner and recruiting 
additional proteins into a larger repressor complex. 
Enhancer and silencer sequences can be identified 
using the same approaches used for gene identification. 


1 2 3 
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Mutant analysis can reveal sequences important for gene 
regulation. For example, the O° mutants of the lac operon 
that Jacob and Monod characterized identified the lac 
operator as an important regulatory sequence. Examples 
of mutations in eukaryotic enhancers have similarly been 
identified by mutant analysis, as described in a later sec- 
tion. Conservation of noncoding sequences across species 
can also indicate functional regulatory sequences, a con- 
cept to which we will return in Chapter 18. In addition, 
direct testing of sequences for regulatory functions can be 
used to delineate regulatory sequences, an approach we 
will explore further in Chapter 16. 


Integration and Modularity of Regulatory 
Sequences 


Despite the diversity of the combinations through which 
regulatory sequences and proteins control transcrip- 
tion in eukaryotes, there are some commonalities in 
the molecular machinery that coordinates this regulatory 
activity. Enhancers and silencers are typically composed 
of binding sites for a number of transcription factors, and 
this allows them to integrate the activities of different 
sets of transcription factors in order to produce different 
outputs. Such a group of transcription factor binding sites 
is often referred to as an enhancer or silencer module. 
For example, studies of enhancer-sequence composition 
in the eukaryotic virus SV40 (simian virus 40) revealed 
modular sequences that have since been found to be simi- 
lar to those of enhancers of other eukaryotes. The SV40 
enhancer module consists of adjacent regions of con- 
served sequences located about 200 bp upstream of the 
transcription start point of regulated genes. Each of seven 
segments of conserved sequence binds specific regulatory 
proteins (Figure 15.3). 

While we have characterized regulatory sequences 
as enhancers or silencers, some regulatory modules bind 
both activators and repressors and thus act to integrate 
both positive and negative signals into a single output. 
In such cases, repressor activity often prevails over the 
activity of activators. (An example of such a regulatory 
module is present in Figure 20.9.) As we will see in the 
next section, the modularity of transcriptional regulation 
in eukaryotes can provide the flexibility that multicel- 
lular organisms need for regulation of differential gene 
expression. 
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Figure 15.3 Enhancer sequences and the regulatory proteins that bind them. The SV40 enhancer 
sequence contains seven short sequence segments targeted by specific regulatory proteins. 
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Transcription Regulation by Enhancers 
and Silencers 


In a broad sense, enhancer and silencer activity controls 
the timing and location of eukaryotic gene transcription 
to help ensure the proper function and development of 
organisms (for example, by making a polypeptide avail- 
able at crucial times or in specific cells or tissues). The 
enhancers and silencers controlling transcription of a 
gene can be nearby or far from the gene they regulate, 
though DNA loop formation can bring even very distant 
sequences together. In yeast, enhancers and silencers 
are usually situated relatively close to the genes they 
regulate. The major enhancer controlling expression of 
the B-globin complex in humans is also very close to the 
genes it regulates. Often, however, the distance between 
an enhancer or silencer sequence and the gene it targets 
for regulation is vast. 

An example of a distant enhancer is provided by the 
SHH (Sonic hedgehog) gene, which in humans and other 
mammals directs the development of limbs and in its 
wild-type form produces five digits (fingers and toes) on 
each appendage. SHH is expressed in a tissue-specific 
manner in limbs under the direction of an enhancer that 
is 1 million base pairs (1 megabase) away from the gene. 
Genomic sequencing analysis reveals that the SHH en- 
hancer is actually located in an intron of a neighboring 
gene (see Figure 18.15). 

A general model for eukaryotic transcription regula- 
tion must incorporate the action of enhancers and silenc- 
ers while taking the variability of their locations and their 
tissue-specific patterns of regulation into account. The 
model depicted in Figure 15.4, for SHH, shows two distant 
enhancers controlling transcription of the same gene in 
a tissue-specific manner. In this example, SHH gene is 
shown expressed in the brain and in limbs. Transcription 
in these tissues is controlled by different regulatory pro- 
teins and transcription factors produced in each cell type. 
One combination of regulatory proteins binds one en- 
hancer in brain cells, but a different combination of regu- 
latory proteins binds an alternative enhancer in limb cells. 
The different regulatory proteins present in different 
types of cells lead to tissue-specific patterns of expression 
of the target gene, producing a different set of polypep- 
tides in each case. Similar models depicting the binding of 
repressor proteins to silencer sequences describe how dis- 
tant silencers can inhibit transcription of targeted genes. 

This model illustrates an important aspect of eukary- 
otic transcription regulation. Only when all of the necessary 
transcription factors and regulatory proteins are present in 
a cell can the assembly of protein complexes required for 
the tissue-specific or development-stage—specific pattern of 
transcription take place. The protein complexes assembled 
at regulatory sequences direct patterns of gene expression 
by activating transcription of certain genes while block- 
ing transcription of other genes. The polypeptides that are 
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Figure 15.4 Tissue-specific enhancer action. (a) The 
limb-specific enhancer binds different, limb-specific transcription 
factors to express SHH differently in limb cells. (b) A different 
brain-specific enhancer is bound by brain-specific transcription 
factors and activates SHH transcription in brain cells. 


ultimately produced in each cell or at each stage of develop- 
ment drive the processes that make cells distinctive and 
lead to the observed developmental changes. 


Locus Control Regions 


The human f-globin gene was the focus of our attention 
in an earlier chapter (see Chapter 10). Recall that this 
gene produces the f-globin polypeptide, two copies of 
which join with two a-globin polypeptides produced by the 
a-globin gene to form the heterotetrameric hemoglobin 
molecule. The B-globin gene is, however, only one of six 
very closely related globin genes forming the B-globin com- 
plex on human chromosome 11 (Figure 15.5a). Located 
close to the B-globin complex is a regulatory region known 
as a locus control region (LCR). LCRs are highly special- 
ized enhancer elements that regulate the transcription of 
multiple genes packaged in complexes of related genes. 
The LCR regulating transcription of genes in the B-globin 
complex contains four distinct cis-acting regulatory se- 
quences, designated HS1 to HS4. Together these elements 
orchestrate the sequential developmental expression of the 
B-globin—complex genes as a fetus develops during gesta- 
tion. The LCR and the six genes it regulates occupy just 
over 70 kb. 

Each gene of the B-globin complex produces a distinct 
globin polypeptide that imparts a different oxygen- 
carrying capacity to hemoglobin. During gestation, the 
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100- Gy + Ay 
80-4 
60-4 


% of total 
B-globin synthesis 


T T T T T T 
6 12 18 24 30 36 42 48 
Weeks of age 


6 12 18 24 30 36 
Weeks of gestation Birth 


Figure 15.5 Locus control and developmental expression 
of human B-globin-complex genes. (a) The locus control 
region (LCR) of the human -globin complex contains four 
regulatory segments (HS1 to HS4). (b) The LCR regulates the 
expression of five genes (Yf is an unexpressed pseudogene) in 
a developmental pattern matched to gestational age. 


oxygen requirements of the developing fetus change as 
its size increases and its organs develop. As gestation pro- 
ceeds, transcription of the genes of the B-globin complex 
is switched from one to the next to produce hemo- 
globin molecules that have the oxygen-carrying capacity 
required by the developing fetus. The order of expression 
of B-globin—complex genes during development matches 
the order in which they occur on the chromosome. 
Figure 15.5b shows the expression profile of these genes 
during development. The HS1 to HS4 components of 
the B-globin—complex LCR bind regulatory proteins that 
direct the formation of small DNA loops, and these serve 
as a bridge to the promoters of the B-globin—complex 
genes (Figure 15.6). The composition of enhanceosomes 
bound to the LCR varies during development to vary the 
resulting loops and thus produce the developmentally 
regulated pattern of gene expression from the B-globin 
complex. A similar LCR drives transcription of a smaller 
number of genes in the a-globin complex. 


Mutations in Regulatory Sequences 


Our previous discussions of mutations have described 
numerous ways in which changes in DNA can result in 
abnormal polypeptides or abnormal levels of polypeptide 
production. Recent genome-wide mapping studies in hu- 
mans suggest that many disease-susceptibility alleles reside 
in noncoding sequences that may be regulatory. Here, we 
take a moment to consider examples of enhancer muta- 
tions that are the cause of hereditary disorders in humans. 

The term thalassemia is used to describe certain he- 
reditary anemias in which mutation leads to an imbalance 
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Figure 15.6 Human ß-globin-complex locus control 
region. In combination with regulatory proteins that vary with 
developmental stage, the LCR forms DNA loops that also vary 
with developmental stage, allowing it to activate transcription 
of specific genes of the complex. RNA polymerase at the left 
transcribes the ô globin gene and the RNA polymerase at the 
right the B globin gene. 


of production of a-globin and B-globin polypeptides. This 
imbalance reduces the amount of functional hemoglobin, 
since each molecule needs an equal number of both poly- 
peptides. Many distinct types of thalassemia result from 
different mutations of the a-globin or B-globin genes. 
In some thalassemia patients, however, no mutations of 
either globin gene were detected. Furthermore, the pro- 
moters of both genes were wild type, so the search for the 
source of the mutations in this group of patients had to 
be expanded. In several cases, the thalassemia mutations 
are due to deletion or chromosome-rearrangement muta- 
tions that alter the LCR of one of the globin gene com- 
plexes. These deletions result in enhancer mutations that 
alter the level of transcription of affected genes and lead 
to an imbalance of polypeptide production. 
Base-substitution mutations in enhancers are another 
source of enhancer dysfunction. The SHH enhancer, lo- 
cated 1 megabase from the SHH gene it regulates, is 
mutated in certain cases of a condition called polydactyly, 
in which extra fingers and toes can form during develop- 
ment. The extra digits result from abnormal expression 
of the SHH gene. In studies of certain human families 
with polydactyly, single-base substitutions in the SHH en- 
hancer have been identified. In addition, studies in mice, 
in which a deletion of the SHH enhancer has occurred, 
reveal significant abnormalities of limb development. 
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Figure 15.7 Conservation 
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Comparisons among species reveal DNA-sequence con- 
servation in some enhancers. This implies that natural 
selection is operating to retain enhancer function, that 
is, to retain the capacity to bind specific regulatory pro- 
teins by conserving sequence composition. Figure 15.7 
shows enhancer sequences for the (-interferon gene 
in several mammals; the abbreviations represent the 
enhancer-binding proteins whose binding relies on cer- 
tain sequences. The species listed in the figure share a 
common ancestor from which their different lineages 
diverged approximately 100 million years ago. 

Genomic sequence analysis indicates evolutionary 
constraint on the diversification of some enhancer se- 
quences. Enhancer elements that have been conserved 
throughout vertebrate evolution regulate key genes con- 
trolling the development of the vertebrate body plan. We 
will return to genomics approaches to identifying con- 
served regulatory sequences in Chapter 18. In contrast, 
enhancer module sequences have also been observed 
to evolve quite rapidly. In these cases, since the output 
from an enhancer module is a result of the integration of 
several inputs, different combinations of activators and 
repressors can still result in similar outputs. 


Yeast Enhancer and Silencer Sequences 


The yeast Saccharomyces cerevisiae provides a simple 
model to illustrate the principles of eukaryotic transcrip- 
tional regulation. The regulation of transcription by en- 
hancer sequences is well understood in Saccharomyces 
cerevisiae, where transcription of genes involved in the 
galactose utilization pathway, among others, is carefully 
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regulated by enhancer-like sequences. When the mono- 
saccharide galactose is the only sugar in the growth me- 
dium, strains of gal* yeast will induce the transcription 
of four enzyme-producing genes, GALI, GAL2, GAL7, 
and GAL10, that together import extracellular galactose 
(GAL2) and then, through a short series of biochemical 
reactions, break down intercellular galactose into glucose- 
1-phosphate for glycolysis (GALI, GAL7, and GAL10; 
Figure 15.8). Each of the four genes has its own promoter, 
but transcription of the genes is regulated by another 
gene, GAL4, which produces a regulatory protein. Gal4 
protein is a transcription activator protein that binds 
to an enhancer element—called an upstream activator 
sequence (UAS) in yeast—located upstream of each of 
the four GAL genes. The Gal4 regulatory protein is con- 
tinuously available in yeast cells and interacts with Gal80, 
the product of the GAL80 gene. When Gal80 protein 
binds to Gal4 protein, it inactivates Gal4 and blocks its 
ability to activate transcription. 

The UASg sequences are cis-acting regulatory el- 
ements, and Gal4 protein is a trans-acting regulatory 
protein. Each UASg element contains two 17-bp repeat 
sequences that are the binding sites for Gal4 protein. In 
its active, DNA-binding form, Gal4 is a homodimeric 
protein composed of two identical polypeptides that form 
two active domains. The DNA-binding domain, at one 
end of the Gal4 dimer, targets the 17-bp repeats of UASg. 
The activation domain, at the opposite end, is a target for 
binding by the protein Gal80. Since Gal4 and Gal80 are 
each constitutively produced, they are normally bound 
to one another at the activation domain of Gal4. In this 
configuration, the DNA-binding domain of Gal4 is inac- 
tive, and the dimer is unable to bind UASg. Without Gal4 
binding to UASg, transcription of GAL genes is blocked 
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Figure 15.8 Galactose utilization in S. cerevisiae. Galactose utilization requires the action of products 


of each of four galactose-utilization (GAL) genes. 
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Figure 15.9 Regulation of GAL gene transcription. 

(a) When galactose is absent, Gal80 protein binds the activation 
domain of Gal4 to inactivate that protein and block GAL gene 
transcription. (b) When galactose is present, Gal3 protein binds 
Gal80 protein to prevent it from binding Gal4 protein. The DNA- 
binding domain of Gal4 protein is then available to bind the two 
17-bp segments of UASg to help initiate GAL gene transcription. 


(Figure 15.9a). Conversely, when galactose is present, ga- 
lactose and Gal3, the protein product of another GAL 
gene, bind to Gal80. Binding of the galactose—Gal3 com- 
plex alters Gal80 and causes it to release Gal4. The free 
Gal4 dimer then binds UASg¢ and activates GAL gene 
transcription (Figure 15.9b). 

In the GAL gene system, Gal4 acts as an activator 
protein, initiating transcription. Its target DNA sequence 
is UASg, which acts like an enhancer sequence and is 
separated from GAL gene promoters by a large number 
of nucleotides. Gal4 binding leads to the formation of 
a multiprotein complex known as Mediator, which is 
an enhanceosome that forms after Gal4 binds UASc. 
When inducing the formation of a DNA loop, Mediator 
makes contact with the general transcription apparatus— 
including TFIID (transcription factor II D) and RNA 
polymerase II (Pol II)—at a GAL gene promoter (see 
Figure 8.12). Thus, the transcription of GAL genes by 
RNA polymerase II is dependent on transcription activa- 
tion by Gal4 binding to UASg elements and causing the 
formation of Mediator. Distant enhancers and silencers 
use the same mechanism of DNA loop formation to regu- 
late transcription of targeted genes. 

A common mode by which repressor proteins inhibit 
transcription in bacteria is to bind to operator sequences 
that overlap promoters, blocking the binding of RNA 
polymerase (see Chapter 14). In eukaryotes, this mecha- 
nism of transcription inhibition is not seen. Among the 
mechanisms by which eukaryotic repressors do inhibit 
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Figure 15.10 Transcription repression of the yeast GAL1 
gene. The proteins Mig1 and Tup1 bind to the Mig] site to 
repress transcription when glucose is available in the growth 
medium. 


transcription is the binding of eukaryotic repressors to 
silencer sequences, thus directly preventing enhancer- 
mediated transcription. The galactose-utilization genes in 
yeast offer an example of this direct mechanism of tran- 
scription repression. When glucose is present in the yeast 
growth medium, the protein Migl is produced. Mig1 
binds a silencer sequence located between UASg and the 
GALI promoter (Figure 15.10). Mig] in turn attracts the 
protein Tup1, and together these proteins form a repres- 
sor complex that prevents UAS¢ from directing the initia- 
tion of transcription. 


Insulator Sequences 


Considering that enhancers can be located far from the 
genes they regulate, what mechanisms direct enhancer ac- 
tion toward the intended gene and away from other nearby 
genes that are not regulated by the same enhancer? The 
answer, in part, lies in insulator sequences, cis-acting 
sequences located so as to separate enhancers from pro- 
moters of genes that are to be insulated from the effects of 
the enhancer. Insulators are protein-binding sequences that 
direct enhancers to interact with the intended promoter 
and that block communication between enhancers and 
other promoters (Figure 15.11). The mechanism of this ac- 
tivity may consist of allowing the formation of DNA loops 
containing enhancers and their intended promoter targets 
while preventing the formation of DNA loops containing 
an enhancer and a promoter that is not its intended target. 

Up to this point our description of eukaryotic gene 
regulation has analogies with that of gene regulation 
in bacteria. First, in both lineages, specific sequences 
upstream of the transcription start site are required for 
recruitment of an RNA polymerase. Second, the tran- 
scriptional output is a result of the combinatorial activi- 
ties of activator and repressor transcription factors bound 
to regulatory sequences that promote or facilitate RNA 
polymerase activity. For example, the lac operon in E. coli 
is positively regulated by the CAP—cAMP complex bind- 
ing to upstream regulatory sequences and negatively reg- 
ulated via the lac repressor protein, with repression being 
dominant over activation—a situation similar in concept 
if not molecular mechanism to a gene regulatory module 
in eukaryotes. The major difference in gene regulation be- 
tween eukaryotes and bacteria is related to the packaging 
of DNA, the subject of the next section. 
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15.2 Chromatin Remodeling and 
Modification Regulates Eukaryotic 
Transcription 


Recall from Chapter 11 that eukaryotic chromatin 
can be broadly divided into two categories based on 
its extent of compaction: euchromatin, which is loosely 
compacted and available for transcription, and hetero- 
chromatin, which is more densely compacted and is tran- 
scriptionally inert. Some regions of the genome are always 
heterochromatic, referred to as constitutive heterochro- 
matin, while others switch back and forth between being 
euchromatic and heterochromatic. These latter regions 
often contain genes that are active only at specific times 
or in certain tissues. When DNA that is normally eu- 
chromatic is placed in the vicinity of heterochromatin, 


the heterochromatic character may spread into the nor- 
mally euchromatic region, silencing gene expression, a 
phenomenon called position effect variegation (PEV) (see 
Section 11.4). Analysis of mutations that affect the fre- 
quency or intensity of PEV in Drosophila provided the 
first insights into how euchromatic and heterochromatic 
states are established and maintained. 


PEV Mutations 


Genetic analysis of eukaryotic genomes reveals PEV to be 
a widespread phenomenon, suggesting that mechanisms 
controlling chromatin structure are important in the con- 
trol of gene expression. In Drosophila, mutations modi- 
fying PEV have led to the identification of several genes 
and proteins that play a direct role in establishing and 
maintaining chromatin structures associated with gene 
expression and gene silencing. The starting point was a 
mutant line in which the eye color is variegated, wild-type 
red and mutant white, due to an inversion placing the 
white gene in the vicinity of centromeric heterochromatin 
(see Figure 11.18). Mutations in which the variegation 
is either enhanced or suppressed were then identified. 
Mutations known as E(var) mutations, where E(var) is 
short for enhancers of position effect variegation, increase 
or enhance the appearance of the mutant white-eye phe- 
notype by encouraging the spread of heterochromatin 
beyond its normal boundaries. The effect of E(var) muta- 
tion is to produce a greater number of eye cells lacking 
pigment (Figure 15.12). In contrast, Su(var) mutations, 
where Su(var) is short for suppressors of position effect 
variegation, restrict the spread of heterochromatin or 
interfere with its formation. Su(var) mutations increase 
the extent of normally pigmented regions of the eye by 
suppressing the emergence of white patches. 
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Figure 15.12 E(var) and Su(var) mutations. Mutations in 


genes whose protein products participate in chromatin modifi- 
cation are detected by enhancement or suppression of position 
effect variegation. 
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Several dozen E(var) and Su(var) mutations are 
known in the Drosophila genome, and Su(var) mutations 
have proven especially valuable in the identification of 
genes and proteins that modulate chromatin structure. 
Genetic analysis of E(var) and Su(var) mutations supports 
the hypothesis that chromatin structure is dynamic and 
is associated with gene expression. In fact, chromatin 
structure appears to oscillate: Sometimes it is in a highly 
condensed state in which gene transcription is silenced 
(i.e., heterochromatic), and sometimes it is in a more 
loosely condensed state that allows transcription (ie., 
euchromatic), but it often exists in an intermediate state 
of condensation. 

The analysis of one prominent group of Su(var) muta- 
tions exemplifies how the detection of defective proteins 
can elucidate normal functions. Some Su(var) mutations 
are caused by defective expression of heterochromatin 
protein-1 (HP-1), a protein found in association with 
centromeres, telomeres, and other heterochromatic chro- 
mosome locations in Drosophila. Comparison of Su(var) 
mutants with wild types reveals that HP-1 is a nucleosome- 
binding protein that targets lysine amino acids in position 
9 of histone H3 if they carry a methyl group. Methylation 
of lysine 9 of H3 is one of the most common epigenetic 
modifications of histones in heterochromatic regions. The 
absence of HP-1 interferes with heterochromatin forma- 
tion and suppresses variegation. 

A second group of Su(var) mutations affects genes 
encoding histone methyltransferases (HMTs), enzymes 
responsible for catalyzing the addition of methyl groups 
to amino acids of histone proteins. Histone methyltrans- 
ferases appear to target methylation-specific basic amino 
acids (e.g., arginine and lysine) in nucleosomes, attaching 
methyl groups to these amino acids as part of epigenetic 
marking of histones. As noted above, the lysine residue 
in position 9 of histone protein H3 is a frequent target for 
methylation. Upon methylation, this location is described 
as H3K9me, which is short for histone 3, lysine (one-letter 
abbreviation K), position 9, and methylation. If HMTs are 
not functioning properly, epigenetic methylation is not 
established, and heterochromatin formation is inhibited. 

The identification of the functions of these two 
groups of Su(var) mutations led to a simple model of 
HP-1 and HMT function predicting that specific meth- 
ylated histone locations in nucleosomes (e.g., H3K9me) 
are methylated by HMTs and act as sites of HP-1 bind- 
ing that helps condense chromatin structure to silence 
gene expression (Figure 15.13). According to this model, 
Su(var) mutants that are defective in their silencing of 
w* could carry an HMT gene mutation that leads to the 
failure to properly methylate nucleosomes, or they could 
carry a mutation of the HP-1 gene and be rendered unable 
to remodel chromatin to a tightly condensed form. 

Collectively, the experimental analyses of suppressors 
and enhancers of PEV identify genes that make epigen- 
etic “marks” on histone proteins, causing attachment and 
detachment of methyl, acetyl, and phosphoryl groups to 
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Figure 15.13 HMT and HP-1 modify chromatin. Mutation 
analysis identifies the proteins HMT and HP-1 as drivers of 
heterochromatin formation. HMT or HP-1 mutations prevent 
chromatin modification. 


amino acids of the histones. These epigenetic marks are 
associated with chromatin remodeling that leads to gene 
transcription or gene silencing. The patterns of methyla- 
tion and demethylation, acetylation and deacetylation, and 
phosphorylation and dephosphorylation are maintained 
on histones and may be passed through successive genera- 
tions of cells, as we explore more closely in later pages. Five 
important features of epigenetic modification have been 
identified by researchers: (1) Epigenetic modifications alter 
chromatin structure, (2) they are transmissible during cell 
division, (3) they are reversible, (4) they are directly associ- 
ated with gene transcription, and (5) they do not alter DNA 
sequence. We turn now to a discussion of how chromatin 
architecture is remodeled and modified and then explore 
examples of how changes in chromatin structure lead to 
activation or repression of gene expression. 


Overview of Chromatin Remodeling and 
Chromatin Modification 


The defining feature of eukaryotic DNA is its packaging 
into chromatin. How, then, do the activator and repressor 
transcription factors bind to regulatory DNA that is pack- 
aged into chromatin? There are three basic mechanisms 
by which trans-acting proteins access specific regulatory 
DNA sequences in eukaryotic chromosomes. 

First, some regulatory sequences are not tightly 
bound by histones, which thus allow more or less direct 
entry to the regulatory DNA. These sequences include the 
“linker” sequences between nucleosomes and sequences 
with specific characteristics that prevent histones from 
binding efficiently. 

Second, proteins called chromatin remodelers can 
enzymatically change the distribution or composition of 
histone octamers (nucleosomes). Chromatin-remodeling 
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enzymes are recruited to specific sites in the chromatin by 

trans-acting factors that bind to specific DNA sequences. 
Asa third mechanism of access, proteins called chro- 

matin modifiers can enzymatically modify histones by 


adding or removing methyl or acetyl groups at spein 


amino acid residues, most commonly lysines, of histone 
proteins. The addition of acetyl groups is associated 
with gene activation and is typically found in euchroma- 
tin. In contrast, removal of acetyl groups and addition 
of methyl groups to specific lysine residues are associ- 
ated with gene repression and typically found in het- 
erochromatin. As with chromatin-remodeling enzymes, 
chromatin-modifying enzymes are recruited to specific 
sites in chromatin by trans-acting factors that bind to 
specific DNA sequences. 

This combination of activities determines the relative 
access of trans-acting transcription factors to cis-acting 


DNA sequences in particular cells, at different times of 


organismal development, and under certain physiological 
conditions. Thus, chromatin remodelers and chromatin 
modifiers mediate the reversible transition from inactive 
heterochromatic DNA to active euchromatic DNA. 


Open and Covered Promoters 


Two contrasting states of nucleosome association with 
promoter sequences, known as open promoters and 
covered promoters, are at opposite ends of a continuum of 
nucleosome association with regulatory DNA sequence. 
Most promoters fall somewhere between these extremes 
with respect to their association with nucleosomes, but an 
examination of open promoters and covered promoters 
can help us understand how chromatin structure contrib- 
utes to transcription regulation. 

Open promoters cause genes to be constitutively 
transcribed. These promoters have a nucleosome- 
depleted region (NDR), which is a 150- to 100-bp region 
containing few nucleosomes that lies immediately up- 
stream of the start of transcription. These promoters 
do not generally contain a TATA box. Instead, a region 
rich in adenine and thymine, known as a poly A/T tract, 
is located in the NDR, near the transcription start site 
(Figure 15.14a). The poly A/T tract contains binding se- 
quences (BS) that attract transcription activators (ACT). 
This binding region is usually flanked by sequences that 
help position two nucleosomes, one upstream and one 
downstream, of the NDR. The downstream nucleosome, 
identified as the +1 nucleosome, is placed at the transcrip- 
tion start site. This +1 nucleosome contains a variant 
histone 2A protein known as H2AZ that is readily modi- 
fied for removal from the transcription start site at tran- 
scription initiation, allowing RNA polymerase II to bind 
and access the transcription start sequence. 

Covered promoters, on the other hand, characterize 
genes whose transcription is regulated. Transcription of 
these genes is blocked until nucleosomes are displaced or 


(a) Open promoter 
-2 nucleosome -1 nucleosome +1 nucleosome +2 nucleosome 


ay H2AZ 
BS 
EA 


L Transcription 


NDR start site 
Poly A/T tract 


(no TATA box) 


(b) Covered promoter 
@ Activator binding 


| +1 nucleosome +2 nucleosome 


TATA box 


Nucleosome 
displacement 


(2) Chromatin remodeling and 
additional binding 


CA BS TATA 


©2009 Macmillan Publishers Ltd box 


Figure 15.14 Transcription of open and covered promoters. 
(a) Open promoters have a nucleosome-depleted region (NDR) 
and no TATA box. Activator proteins (ACT) are attracted to bind- 
ing sequences (BS) to recruit RNA polymerase II for transcrip- 
tion. (b) With covered promoters, transcription is activated by 
activator-protein binding and displacement of nucleosomes. 


removed from the promoter to allow transcription activa- 
tors to bind to the necessary sequences, an event that leads 
in turn to RNA polymerase II binding and transcription 
initiation (Figure 15.14b). These promoters generally con- 
tain TATA boxes and other transcription-factor binding 
sequences. At covered promoters, there is active competi- 
tion between nucleosomes and transcription-activating 
factors for binding. As a result, regulatory mechanisms 
are required that remodel chromatin to give activator 
proteins access to binding sequences in order to initiate 
transcription. 


Mechanisms of Chromatin Remodeling 


Chromatin remodeling refers to chromatin modifica- 
tions that reposition nucleosomes in such a way as to 
open or close promoters and other regulatory sequences. 
Moving nucleosomes off regulatory sequences improves 
access to them by transcription-activating regulatory pro- 
teins. Open chromatin is chromatin in which the as- 
sociation of DNA with nucleosomes is relaxed in regions 
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containing regulatory sequences, allowing access by regu- 
latory proteins. Modifications that cause regulatory DNA 
to be covered by nucleosomes, thus restricting the access 
of regulatory proteins to the sequences, produce closed 
chromatin. In closed chromatin, regulatory sequences 
cannot be efficiently accessed by regulatory proteins, and 
genes are transcriptionally silent. 

Molecular biologists can determine experimentally 
whether a region of DNA contains closed chromatin or 
open chromatin by assessing the sensitivity of the region 
to the DNA-digesting enzyme DNase I. This enzyme 
randomly cuts DNA in open chromatin regions but is not 
able to do so where chromatin is closed. Regions of open 
chromatin, sensitive to DNase I digestion, are known as 
DNase I hypersensitive sites. Where DNase I hypersen- 
sitivity is detected, genes are potentially transcribable. 
The experimental analysis of DNA for DNase I hypersen- 
sitivity is much like DNA footprint protection analysis 
described in Research Technique 8.1 (pages 279-280). 
Fragments of DNA created by exposure to DNase I are 
separated and analyzed by gel electrophoresis. 

DNase I hypersensitivity occurs in the immediate 
vicinity of transcribed genes and can also appear 1000 bp 
or more upstream or occasionally downstream of ac- 
tively transcribed genes. Hypersensitive regions surround 
promoters, enhancers, and other transcription-regulating 
sequences. The open chromatin complexes detected by 
DNase I hypersensitivity are the sites for binding by 
transcription-activating proteins and for transcription 
(Figure 15.15). Genetic Analysis 15.1 guides you through an 
analysis for the presence of DNase I hypersensitivity in a 
region of DNA. 

Another, more direct technique for identifying where 
proteins are bound to DNA is a process called chromatin 
immunoprecipitation (ChIP). The transcription factors, 
with associated chromatin and DNA, are isolated from 
living cells by first chemically cross-linking the proteins 
and DNA together and then, using an antibody specific 
to a transcriptional regulatory protein of interest to pre- 
cipitate the DNA-chromatin combination containing that 
protein of interest. The DNA from the precipitated chro- 
matin is then released by reversing the cross-linking, after 
which the isolated DNA is amplified by PCR (Chapter 7) 
and sequenced. The sequences obtained will correspond 
to the DNA to which the transcriptional regulatory pro- 
tein of interest was bound in the cells. This approach 
is not only applicable to specific activator or repressor 
proteins but also can be performed using antibodies tar- 
geting specific chromatin modifications described later in 
this chapter. ChIP can be targeted to determine whether 
a protein of interest is bound to a specific DNA locus or 
can be used to determine all the sites in the genome to 
which a particular protein is bound, a concept that we will 
return to in Chapter 18. 

Chromatin remodelers are the protein com- 
plexes that carry out chromatin remodeling by moving 
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Figure 15.15 Closed and open chromatin structure. 

(a) Closed chromatin is inaccessible to transcriptional proteins 
and insensitive to DNase I digestion. (b) Open chromatin binds 
transcriptional proteins and is DNase | hypersensitive. 


nucleosomes in three principal ways (two are seen in 
Figure 15.16). One type of chromatin-remodeling enzyme 
changes nucleosome organization by either sliding them 
along the chromosome or removing them from the DNA. 
These enzymes usually work by uncovering enhancers or 
promoters and thus are associated with gene activation. 
A second type of chromatin-remodeling enzyme reorga- 
nizes nucleosomes by inducing nucleosome movement. 
These enzymes usually repress transcription by moving 
nucleosomes. The third type of chromatin-remodeling 
enzyme changes the composition of histone octamers, 
replacing specific histone proteins with variant proteins. 
These changes are associated with gene activation. 

A number of distinct chromatin remodelers are 
known. Three of the best-understood categories, classi- 
fied by their main functions, are the SWI/SNF complex, 
which both slides and relocates nucleosomes; the [SWI 
complex, which helps direct the placement of nucleo- 
somes; and the SWR1 complex, which substitutes the vari- 
ant histone protein H2AZ in nucleosomes in place of the 
more common H2A protein. 
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Figure 15.16 Nucleosome displacement to expose regula- 
tory sequences. (a) Nucleosomes can be displaced by sliding 
or (b) can be repositioned on other DNA regions. 


The SWI/SNF Complex Pronounced “swee-sniff” or 
“swy-sniff” this category of chromatin remodelers was 
first described in yeast and is now known to operate 
in all eukaryotes. It was discovered through analysis of 
mutations that affect two unconnected activities of yeast. 
One set of yeast mutants were unable to switch (SWI) 
mating type, a process tied to the ability of haploid yeast 
strains to fuse to form diploid strains. SWI mutations 
result from alterations of any of three genes, designated 
SWIL, SWI2, and SWI3. A second set of mutants was 
sucrose-nonfermenting (SNF) mutants. SNF mutants 
lose the ability to grow on medium containing the 
sugar sucrose owing to a mutation in any of three genes 
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designated SNF2, SNF, and SNF6. The discovery that SW12 
and SNF2 are the same gene indicated that the activity 
blocked in SWI and SNF mutants was broader than 
just mating-type switching or the ability to initiate the 
transcription of genes needed for sucrose fermentation. 

The composition of the SWI/SNF complex varies 
somewhat among eukaryotic species, but in each species 
the complex functions to open chromatin structure by 
displacing or ejecting nucleosomes. These actions expose 
promoter and other regulatory sequences to allow bind- 
ing of transcription factors or activators that help initiate 
transcription (Figure 15.17@). 


The ISWI Complex Chromatin remodelers of the ISWI 
(imitation switch) complex primarily function to control 
the placement of nucleosomes into an arrangement 
that causes the region to be transcriptionally silent. 
These proteins have the ability to “measure” the length 
of linker DNA between bound nucleosomes in order 
to place the nucleosomes at regular intervals where 
they will cover promoters, thus preventing regulatory 
proteins from having access to the TATA box and other 
regulatory sequences. There is some evidence that certain 
nucleosome modifications can block ISWI activity, by a 
process that could be related to the opening of promoter 
and chromatin structure (see Figure 15.18 @). 


The SWR1 Complex The switch remodeling 1, or SWR1 
complex, is responsible for replacing the common histone 
2A protein of nucleosomes with a variant form known as 
H2A.Z that differs from the more common form by amino 
acid differences internal to the protein and in the amino 
terminal (N-terminal) protein tail. The differences found 
in H2A.Z alter its pairing with other H2A proteins and its 
interactions with H3/H4 tetramers in the nucleosome. 
H2A.Z is found primarily at the so-called +1 nucleo- 
some that is affiliated with the start of transcription. 
Functional analyses in several species suggest that the role 
of H2A.Z is in the creation of unstable nucleosomes that 
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Figure 15.17 The actions of chromatin-remodeling complexes. @ ISWI assembles and organizes 
nucleosomes in a regular pattern and contributes to transcription repression. @The SWI/SNF family 
opens chromatin structure and helps initiate transcription by either relocating nucleosomes away from 
regulatory sequences or ejecting nucleosomes. @SWR1 inserts the modified histone protein H2A.Z 


into nucleosomes to help facilitate displacement. 
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might then be displaced, ejected from DNA, or modified 
to regulate transcription (see Figure 15.18 ©). 


Chemical Modifications of Chromatin 


In contrast to chromatin remodelers that move histones, 
the proteins called chromatin modifiers chemically 
modify histone proteins in the nucleosomes by adding or 
removing specific chemical groups. These modifications 
alter the strength of association between nucleosomes 
and DNA. The changes can cause chromatin structure 
to relax, leading to open promoters and to transcription 
activation, or they can lead to closed structures that in- 
hibit transcription. The principal chemical modifications 
to nucleosomes take place through the addition and re- 
moval of, primarily, acetyl and methyl groups at specific 
amino acids in the N-terminal (amino terminal) region of 
histones. 

Because different patterns of modifications of histone 
tails lead to greater or lesser amounts of transcription 
by contributing to the opening and closing of chromatin 
structures, molecular biologists Thomas Jenuwein and 
C. Davis Allis suggested that a “histone code” exists. This 
hypothesized code consists of different combinations of 
chemical modifications in histone N-terminal tails, re- 
sulting in different changes to the chromatin structure. 
Supporting this idea, two studies examining different 
aspects of chromatin complexity in two evolutionarily 
distant eukaryotes suggest chromatin exists in only a lim- 
ited number of distinct states (Table 15.1). Examining the 
combinatorial complexity of chromatin modifications in 
Drosophila cells in 2010, Guillaume Filion and colleagues 
identified five principal types of chromatin, each desig- 
nated by color (the Greek word chroma means “color”). 
A similar study of chromatin in Arabidopsis by Francois 
Roudier and colleagues in 2011 examined histone modifi- 
cations and DNA methylation to identify four prominent 
chromatin states (CS) that roughly correspond to those in 
Drosophila. Thus, despite the potential for an enormous 
number of different chromatin states, it appears that only 
a limited number exist in vivo. 

Enzymes that add chemical groups are collectively 
known as “writers,” while those that remove groups are 
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Figure 15.18 Chromatin readers, writers, and erasers. 


known as “erasers” (Figure 15.18a). Proteins that rec- 
ognize the modified histone are called readers. Writers 
and erasers are recruited to specific chromatin loca- 
tions by sequence-specific DNA binding proteins, such 
as activators and repressors. The recruited writers and 
erasers modify the histone tails, producing an opening 
or condensing of chromatin structure at the locus. The 
two prominent chemical modifications are acetyl groups 
(COCHs3) and methyl groups (CH3), which are added to 
or removed from lysine (K) residues in the N-terminal 
tail of histone 3. Three lysines, K4, K9, and K27, are 
particularly important targets for writers and erasers 
(Figure 15.18b). 

Histone acetyltransferases (HATs) are chromatin- 
modifying writers that add acetyl groups, and the acetyl 
groups are removed by histone deacetylases (HDACs), 


Function of Chromatin State 
Active gene transcription (euchromatin) 


Active gene transcription (euchromatin) 
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Table 15.1 Principal Chromatin States in Drosophila and Arabidopsis 
Drosophila Arabidopsis 
Yellow cS 
"Red csi 
-Blue cs2 
Green CS3 
-Black cs4 


Data from Filion, G. J., et al., 2010 and Roudier, F., et al., 2011. 
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GENETIC ANALYSIS 


PROBLEM The tissue enzyme TE2 is expressed in various mouse tissues T 
at different times during the life cycle. Identical chromosome segments TE2 
were isolated at different times in the cycle from a region immediately up- l 
stream of TE2 and analyzed for DNase | aL in sin an TE2 upstream 
BREARITDOWN DN segments were collected from em ryonic region 
I cuts in regions of open chromatin (E) and adult (A) mouse heart (H), kidney TE2 upstream 
but not condensed chromatin (p. 515). (K), and thymus gland (T). In the analysis, a chromosome fragment 
radioactive label was attached to one end wane 
of each chromosome fragment, and the samples from each tissue were ex- Radioactive | 
posed to DNase | to determine if the regions upstream of TE2 were DNase label DNase | 
| hypersensitive. The content from each sample was then separated by gel treatment 
electrophoresis, and the results are as shown below. 
a. Based on the gel results, is there evidence that chromatin remodeling 
plays a role in the expression of TE2? Explain Heart Thymus Kidney 
your reasoning. a E A E A E A 
b. In which tissue(s) and at what times during | BREAK IT DOWN: chromatin © 
ere remodeling is the process by which S S << 
development do the results indicate the ex- | nudeosome position or identity is — ne 
pression of TE2 was most likely taking place? [altered (p. 516). = = o Z 2 
= —— IF 
 — — => 
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Electrophoresis gel 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic this problem addresses 1. This problem concerns an experimental analysis for DNase | hypersensitiv- 
and the nature of the required answer. ity in the region upstream (i.e., the promoter region) of TE2. The answers 


require interpretation of experimental results with respect to chromatin 
structure and gene expression. 


2. Identify the critical information given in 2. Gel electrophoresis results are given for identical chromosome fragments 


the problem. — from embryonic and adult heart, thymus, and kidney. All chromosome 
TIP: DNase | hypersensitivity is detected when fragments Were exposed to DNase l. 
chromatin structure is open and potentially 
accessible to transcription-activating proteins. 
Deduce Sya chromatin is not hypersensitive to DNase I. 
6) Compare and contrast the meaning of a 3. A continuous series of DNase l-digested bands indicates DNase I hypersen- 
continuous series of bands in some lanes sitivity. Hypersensitivity correlates with open chromatin that is accessible 
of the gel versus lanes in which gaps are to transcription. Gaps between gel bands indicate that certain fragments of 
seen between bands. chromosomes are not generated by DNase | treatment. This result signals 
the absence of DNase | hypersensitivity in those regions and suggests closed 
chromatin structure and no transcription. 
4. Evaluate the gel, and describe the pat- 4. Discontinuous band patterns are observed in adult heart and embryonic 
terns of DNase I-digestion bands for thymus gland DNA. This absence of DNase | hypersensitivity suggests 
each sample. closed chromatin structure. Each of the other DNA samples indicates 
hypersensitivity to DNase I. 
Solve Answer a 
5. Determine whether the gel data indi- 5. The DNase | hypersensitivity results indicate differential patterns of TE2 
cates chromatin modification near TE2. expression in different tissues and at different times of development due to 
chromatin modifications. DNase | hypersensitivity resulting from open chro- 
matin appears in embryonic and adult kidney, in embryonic heart, and in adult 
thymus chromosomal material. Hypersensitivity is not seen in adult heart or in 
embryonic thymus chromosomal material, indicating closed chromatin. 
Answer b 
6. Name the tissues in which TE2 is 6. TE2 expression is likely to occur at embryonic and adult stages in the kidney, 
expressed, and describe the develop- in the embryonic heart, and in the adult thymus gland. TE2 expression is un- 
mental timing. likely to occur in adult heart or in embryonic thymus gland. 


For more practice, see Problems 20 and 21. Visit the Study Area to access study tools. MasteringGenetics™ 
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Figure 15.19 Acetylation and deacetylation in open and closed chromatin structure. Histone 
deacetylases (HDACs) deacetylate amino acids in N-terminal histone protein tails and close the chro- 
matin structure. Histone acetyltransferases (HATs) acetylate N-terminal amino acids and help open the 


chromatin structure to activate transcription. 


which act as erasers (Figure 15.19). In their unacetylated 
form, positively charged amino acids such as lysine pro- 
mote nucleosome adherence to negatively charged DNA. 
Acetylation neutralizes the positive charge and relaxes the 
tight hold the nucleosomes have on DNA. Thus, acetyla- 
tion of K9 of histone 3, designated H3K9Ac, is associated 
with an opening of the chromatin and active transcrip- 
tion. HATs are recruited to the chromatin by activator 
proteins (Q), leading to the formation of euchromatin and 
active transcription @). Conversely, HDACs are recruited 
by repressors (@), resulting in the formation of transcrip- 
tionally inactive heterochromatin (@). 

A second common chemical modification of amino 
acids in N-terminal tails of histone proteins is methyla- 
tion, the addition of methyl (CH3) groups by chromatin- 
modifying histone methyltransferases (HMTs), which 
act as writers. Again, lysine is frequently targeted for 
methylation, and residues can be mono- (me), di- (me2), 
or tri-methylated (me3). Depending upon the K residue, 
methylation plays a role in converting open chromatin 
to closed chromatin in conjunction with deacetylation 
(as in the case of H3K9 and H3K27) or, conversely (as in 
the case of H3K4, in conjunction with H3K9 acetylation), 
forms open chromatin (see Figure 15.18b). Demethylation 
is carried out by histone demethylases (HDMTs), which 
act as erasers. HMTs and HDMTs are also recruited to 
the chromatin by activators and repressors in a man- 
ner similar to that depicted for HATs and HDACs in 
Figure 15.20. Thus, the chromatin state can be reversibly 


converted between euchromatin (active) and heterochro- 
matin (inactive) through the combined action of tran- 
scription factors and chromatin modifiers. 

Multiple chemical modifications of N-terminal amino 
acids are required to remodel chromatin from a closed to 
an open structure and vice versa. No single acetylation or 
methylation event determines chromatin structure, but it is 
an event localized to a gene or regions of a gene. While writ- 
ers and erasers must usually be recruited to chromatin by 
sequence-specific DNA-binding proteins, readers, as their 
name implies, can directly bind to the modified histones. 
The role of readers is to “read” the chromatin structure and 
act to maintain it in either an active or inactive state. 

Facultative heterochromatin can alternate between 
an open euchromatic state and a closed heterochromatic 
state. Changes between these two states are driven by 
the recruitment of chromatin-modifying enzymes by ac- 
tivator or repressor proteins. In many eukaryotes, this 
involves an interplay between the opposing activities of 
writers and erasers, with a protein complex called the 
Polycomb group (PcG) acting in gene repression and 
another protein complex called Trithorax (Trx) acting 
to maintain gene expression. PcG and Trx complexes 
are recruited to specific loci by repressors and activa- 
tors, respectively. The PcG complex acts to maintain a 
chromatin state that is marked with H3K27me3 and not 
acetylated; that is, it has an H3K27 HMT and an HDAC. 
In contrast, the Trx complex has a HAC and an H3K27 
HDMT (see Figure 15.19). These states can be stable for 
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the life of an organism, forming a cellular memory and 
ensuring the stable differentiation of cell types. We will 
revisit the role of these complexes during the develop- 
ment of a multicellular organism in Chapter 20. 

Finally, recall from the original description of PEV 
in Chapter 11 that the white gene was relocated next to 
centromeric constitutive heterochromatin. In contrast to 
facultative heterochromatin, this type of heterochromatin 
is characterized by H3K9me3 and is one of the types of 
chromatin identified in Drosophila and Arabidopsis. We 
will return to the question of how constitutive hetero- 
chromatin is maintained later in this chapter. 


An Example of Transcriptional Regulation 
in S. cerevisiae 


To illustrate the role of chromatin modifications in tran- 
scription initiation, we turn to transcription regulation of 
the PHOS gene in the yeast species S. cerevisiae. Our dis- 
cussion of this particular example is based on numerous 
studies that collectively paint a comprehensive picture 
of the actions associated with chromatin modification in 
PHOS transcription initiation and regulation. 

PHOS isa repressible gene encoding an acid phospha- 
tase that removes phosphate groups from other proteins. 
In yeast, PHOS transcription is activated by phosphate 
starvation, but it is repressed when phosphate level is 
high. In the repressed state, access of transcription fac- 
tors and RNA polymerase II to the promoter’s TATA box 
is blocked by a nucleosome labeled —1 in Figure 15.20a. 
Similarly, access of transcription activator proteins to a 
UAS element labeled UASp2 is blocked by a nucleosome 
labeled —2. In the repressed state, the transcription activa- 
tor protein Pho2 and the acetylase protein NuA4 are pres- 
ent upstream of the promoter at a UAS element labeled 
UASp1. Upstream of these are nucleosomes labeled —3 
and —4. There is a low level of acetylation of nucleosomes 
—1 to —4 in the repressed state. Together, the presence 
of the nucleosomes —1 to —4 blocks access of activator 
protein and transcription factors to PHOS regulatory 
sequences. 

Transcription of PHOS occurs when phosphate 
level falls. The Pho4 protein attaches to Pho2, forming 
a protein complex that begins transcription activation. 
Additional acetylation of the -1 to —4 nucleosomes 
takes place under the direction of NuA4. The Pho4— 
Pho2 complex then initiates chromatin modification 
by displacing nucleosome —2 (Figure 15.20b), making 
UASp2 available for binding by the Pho4 protein. The 
SWI/SNF protein complex assembles, and additional 
chromatin modification displaces nucleosomes —1 (that 
previously covered the TATA box), —3, and —4. With 
chromatin opened by nucleosome displacement, general 
transcription factor proteins and RNA polymerase II are 
able to bind the promoter and initiate transcription of 
the PHOS gene. 
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Figure 15.20 Transcription control of PHOS in 
Saccharomyces cerevisiae. (a) Transcription is repressed in 
high-phosphate conditions. (b) In low-phosphate conditions, 
Pho4 joins Pho2 at UASp1, and NuA4 directs acetylation of 
nearby nucleosomes. The SWI/SNF complex attaches, leading 
to the ejection of nucleosomes —1 to —4. RNA polymerase II and 
general transcription factors initiate PHOS transcription. 


Epigenetic Heritability 


Activating the transcription of an individual gene requires 
a confluence of regulatory proteins that remodel or mod- 
ify chromatin to provide enhancer and promoter access to 
transcription factors that initiate and carry out transcript 
synthesis, as we saw above in the detailed description of 
PHOS transcription. Mechanisms controlling differential 
chromatin state formation and maintenance produce pat- 
terns of gene expression in different types of cells that are 
required for the growth and development of complex or- 
ganisms. In a broad sense, these regulatory processes are 
the reason a single fertilized egg can develop and produce 
many distinct types of cells (liver cells, muscle cells, brain 
cells, and so on) that look and act differently even though 
they carry the same genetic information. 

Among the trillions of somatic cells in your body are 
scores of different cell types, and yet all these cells contain 
the same genetic information. The differences of mor- 
phology and function between cell types are genetically 
controlled, as evidenced by the fact that daughter cells 
have the same structures and functions as parental cells, 
but DNA sequence variability is not the reason for those 
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differences. Instead, the differences between somatic cells 
are epigenetic, resulting from the distinct chromatin 
states affecting gene transcription in specific types of cells. 

To repeat, epigenetic patterns are often heritable 
through mitosis from one generation of cells to the next, 
causing daughter cells to have the same patterns of gene 
expression as their parent and sibling cells—a cellular 
memory. On the other hand, some epigenetic changes oc- 
cur in the course of normal growth and development, in 
some cases resulting from different physiological condi- 
tions. These changes are potentially reversible and vari- 
able during the life cycle of an organism, during which 
the transcription of certain genes is turned on and later 
off again, or vice versa. Note that most epigenetic marks 
added during the lifetime of an organism are erased dur- 
ing meiosis, resetting the epigenetic landscape for the 
next generation. However, there is evidence that some 
epigenetic differences can be heritable through meiosis, 
from one generation of the organism to the next, a topic 
we will explore in the Case Study. 

We have previously encountered examples of mi- 
totically heritable variation of gene expression that has 
an epigenetic basis. For instance, position effect variega- 
tion (PEV) in Drosophila results from the movement of 
the transcriptionally active w* allele into the centromeric 
region of the fruit-fly X chromosome (see Figure 11.18). 
The DNA sequence of the gene is not altered. Instead, the 
spread of heterochromatin closes chromatin structure and 
blocks gene transcription by an epigenetic mechanism. 
The repressed transcriptional state is then maintained 
in daughter cells through mitotic division. The result is 
patches of cells descendant from original progenitor cells 
that share the same pattern of inactivation of w* expres- 
sion. These cells form patches of white in the eye of the fly. 

How is epigenetic control maintained in cells? For 
cellular memory to be maintained, any acetyl and methyl 
groups that are present on histones before DNA replica- 
tion must be maintained or established on both the old 
and new histones after DNA replication. The specific 
molecular mechanics of this process are not entirely clear, 
but the partial disassembly and subsequent reassembly of 
nucleosomes is an essential component (see Figure 11.10). 
Recall that chromatin structure is broken down as the 
replication fork passes (see Chapter 11). Nucleosomes are 
separated from the parental DNA strands so the latter can 
serve as templates for the synthesis of daughter strands. 
The nucleosomes partially break apart, and old nucleo- 
some segments along with newly synthesized nucleosome 
segments are reassembled on both new duplexes. 

Immediately after DNA replication, the newly formed 
nucleosomes carry only part of their previous epigen- 
etic information. The original epigenetic state must be 
quickly reestablished by epigenetic marking of the newly 
synthesized histones. Old histones are able to modify new 
histones to have the same pattern of epigenetic marks. 
This process takes place among adjacent nucleosomes, 


thus preserving local epigenetic control of gene transcrip- 
tion. The interaction must also occur over long distances 
so as to maintain higher-order chromatin structure, such 
as that characterizing inactivated X chromosomes (see 
below). It is likely that the presence of PcG and Trx com- 
plexes is required for the continued maintenance of chro- 
matin states through mitoses. 


A Role for IncRNAs in Gene Regulation 


It is becoming increasingly apparent that a class of RNA 
molecules in eukaryotic cells called long noncoding 
RNAs (IncRNAs) play critical roles in gene regulation. As 
their name implies, they are long RNAs without substan- 
tial open reading frames. A study of IncRNAs expressed in 
embryonic stem cells in mice suggests that many IncRNAs 
may act as scaffolds linking chromatin regulatory proteins 
to affect gene expression. Given that the genomes of 
mammals encode a large number of IncRNAs, this may be 
a critical mechanism of gene regulation in the mammalian 
lineage. The best-known example of a IncRNA regulating 
gene expression is Xist, which is involved in X chromo- 
some inactivation in eutherian female mammals. 


Inactivation of Eutherian Mammalian Female 
X Chromosomes 


To achieve the correct balance of X-linked gene expression 
in eutherian mammalian females, the dosage compensation 
mechanism known as X-inactivation occurs. We discussed 
this problem in Chapter 3 and explained that mammalian 
females undergo random X inactivation in each nucleus 
early in gestational development. Recall that random X 
inactivation leaves one active X chromosome that is largely 
euchromatic and one inactive X chromosome that is al- 
most entirely heterochromatic in each nucleus. The het- 
erochromatic X chromosome is almost completely silent 
with respect to gene expression. This highly heterochro- 
matic X chromosome forms a Barr body in the nucleus. All 
cells descending from the ones that originally underwent 
random X inactivation maintain the same active (euchro- 
matic) and inactive (heterochromatic) X chromosomes, 
leading to the mosaic pattern of cells characteristic of 
eutherian mammalian females (see Figure 3.27). 

Extensive studies of X inactivation in mice and 
humans have detected about a dozen genes on the hetero- 
chromatic (inactive) X chromosome that escape silencing. 
One of these genes is critically important to the establish- 
ment and maintenance of X-inactivation. The gene, called 
X-inactivation-specific transcript (Xist), is active on the 
heterochromatic X chromosome and is inactive on the eu- 
chromatic chromosome. It is located in the X-inactivation 
center, or XIC, of the X chromosome (Figure 15.21). 
The Xist gene is transcribed only on the heterochromatic 
chromosome, where it is active; it is not transcribed on 
the euchromatic X chromosome, where it is inactive. 
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Figure 15.21 The X-inactivation center (XIC). The XIC 
contains Xist, which is transcribed to produce a specialized RNA 
that coats the X chromosome. This mechanism is responsible for 
random inactivation in eutherian mammals. 


The gene transcript is a specialized RNA transcript called 
Xist RNA that never leaves the nucleus and is never 
translated. Instead, Xist RNA exclusively coats the X 
chromosome that produces it. The Xist RNA coating at- 
tracts HMTs and HDACs that methylate and deacetylate 
histones, respectively. These epigenetic modifications are 
linked directly to transcriptional silencing of genes. 

The Xist RNA coating, subsequent methylation and 
deacetylation, and other protein-driven modifications 
inactivate one X chromosome and condense it into a het- 
erochromatic state in each eutherian mammalian female 
nucleus. One idea of how the modification is accom- 
plished is that the Xist RNA may act as a molecular 
bridge between the inactive chromatin and the repres- 
sive chromatin-modifying complexes such as PcG. This 
would ensure that the patterns of chromatin modifications 


of the X chromosome established in embryogenesis are 
maintained throughout the lifetime of the organism. Note, 
however, that X-inactivation is reversible in eutherian 
mammalian female germ-line cells, ensuring that the pro- 
cess starts over each generation. 


Genomic Imprinting 


A specialized example of resetting of epigenetic patterns in 
meiosis occurs in certain mammalian and flowering plant 
genes in a mechanism known as genomic imprinting. For 
the small number of mammalian genes subject to genomic 
imprinting, both copies of the gene are functional but just 
one is expressed. 

In mammals, two copies of each autosomal gene are 
inherited—one copy is on a chromosome inherited from 
the mother, and the other copy is on the homologous 
chromosome from the father, and usually both gene copies 
are expressed. For a small number of genes whose expres- 
sion is subject to genomic imprinting, however, this pat- 
tern does not hold. Instead, one copy of the gene is actively 
expressed while the other copy is silent. The expressed 
gene copy is always inherited from a particular parent (for 
some genes it is the mother, for others it is the father), and 
the silent copy is the one inherited from the other parent. 

The best-studied examples of genomic imprinting 
are two human genes encoded very near one another on 
chromosome 15. The insulin growth factor 2 (IGF2) gene 
on the paternally derived copy of the chromosome is ex- 
pressed, whereas the /GF2 gene on the maternally derived 
chromosome is silent. The opposite is the case for the 
H19 gene, which is expressed from the maternally derived 
chromosome 15 but is silent on the paternal copy. These 
two genes are in a region of chromosome 15 contain- 
ing several other genes that are also imprinted. They are 
among the few dozen human genes whose transcription is 
controlled by genomic imprinting. 

Two regulatory sequences are responsible for these 
two instances of genomic imprinting. One is an enhancer 
downstream of H19; the other is an insulator sequence, 
called the imprinting control region (ICR), located be- 
tween H19 and IGF2 (Figure 15.22). In the maternal chro- 
mosome, activator proteins bind the enhancer sequence 
and direct transcription of H19 by interacting with tran- 
scription factors and RNA polymerase II at the promoter. 
The ICR in the maternal chromosome is bound by an 
insulator protein that blocks the enhancer from affecting 
IGF2. On the paternal chromosome, on the other hand, 
extensive methylation of the ICR and H19 prevents insu- 
lator protein binding and blocks transcriptional protein 
binding at the H19 promoter. In the absence of the insula- 
tor protein, the enhancer stimulates transcription of IGF2. 

Genomic imprinting silences expression of paternal 
H19 and maternal JGF2 and directs transcription of pa- 
ternal JGF2 and maternal H19 in all somatic cells. This 
pattern is essential for normal development, and any 
other pattern produces profound abnormalities. A genetic 
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Figure 15.22 Differential genomic imprinting of chromo- 
some 15 in humans. 


condition called Prader-Willi syndrome (OMIM 176270) 
most often results from partial deletion of the portion of 
the paternal copy of chromosome 15 containing H19 and 
IGF2. The condition can also occur if the paternal chromo- 
some 15 is not properly imprinted. A different condition 
called Angelman syndrome (OMIM 105830) is most often 
produced by partial deletion of the same portion of the ma- 
ternal chromosome 15. Angelman syndrome also occurs if 
the maternal chromosome is not properly imprinted. 

Given the importance of imprinting for certain genes 
and considering the different imprinting patterns of gene 
expression in maternally derived versus paternally de- 
rived chromosomes, how does the inheritance of correctly 
imprinted chromosomes occur? The answer is that in 
primordial germ-line cells, the inherited imprinting pat- 
terns are first erased and then are reestablished in the sex- 
specific pattern of the germ line early in gametogenesis 
(Figure 15.23). In the female germ line, methylation of the 
paternal chromosome is reversed by demethylase activ- 
ity, and the insulator protein is removed from the ICR 
on the maternal chromosome. Both chromosomes are 
then re-imprinted with the female-specific pattern. In the 
male germ line, both chromosomes have their imprinting 
erased and then reestablished in the male-specific pattern. 
These processes ensure that each parent passes a properly 
imprinted chromosome during reproduction. 


Nucleotide Methylation 


The methylation pattern identified in genomic imprinting 
of the ICR and H19 gene is a type of methylation that is as- 
sociated with repression of gene expression in many plants 
and vertebrates, particularly mammals, that differs from 
methylation of amino acids in N-terminal histone protein 
tails. In this case, methyl (CH3) groups are attached to 
specific DNA nucleotides, not to amino acids in histone 
protein tails. Nucleotide methylation is performed by spe- 
cialized DNA methyltransferases that add methyl groups 
primarily to cytosines located in CpG dinucleotides, 
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Figure 15.23 Inheritance of genomic imprinting. The 
genomic imprinting patterns on chromosome 15 are erased 
and reestablished in sex-specific forms early in gametogenesis 
to ensure reproductive success. 


side-by-side cytosine and guanine nucleotides in the same 
DNA strand. The p in CpG represents the single phos- 
phoryl group in the phosphodiester bond connecting the 
nucleotides. Complementary strands of DNA containing 
CpG dinucleotides each have 5'-cG-3’. In plants, other C 
nucleotides may be methylated—the ones in 5’-CNG-3’ 
and 5'-CNN- 3’ configurations, for example. 

Much of the cytosine-methylated DNA in eukaryotic 
genomes is in transposable element sequences and non- 
coding sequences and is associated with a transcriptionally 
silent chromatin state. Just as with chromatin-remodeling 
enzymes, the DNA methyltransferases are recruited to 
specific loci by transcription factors when DNA meth- 
ylation is being established. Also paralleling nucleosome 
modification, the pattern of cytosine-methylated sites is 
usually mitotically stable but can be reset during meiosis. 
A simple modification of Sanger sequencing in which the 
DNA is first treated with bisulfite, which converts cytosine 
to uracil but leaves methylcytosine untouched, allows the 
direct determination of the methylation status of DNA. 

Recall from Section 12.4 that deamination of a meth- 
ylated cytosine creates a thymine, which generates a mis- 
match that is repaired either to a C-G or a T-A base pair 
at an approximately equal frequency. Thus, in organisms 
with a significant amount of cytosine methylation, such 
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as in vertebrates, where most of the cytosines in CpG 
dinucleotides are methylated, over time the number of 
CpG dinucleotides is reduced. In these species, sequences 
rich in CpG, called CpG islands, are regions of the genome 
in which there is strong selection for maintenance of 
cytosines, reflecting a functional role for such regions. As a 
result, CpG islands can be used to identify potentially func- 
tional genomic regions such as gene regulatory sequences. 


15.3 RNA-Mediated Mechanisms 
Control Gene Expression 


In the past several years, RNA has emerged as a key 
component in the regulatory control of eukaryotic gene 
expression. Largely unknown before the mid-1990s, RNA- 
mediated regulatory mechanisms have rapidly become 
a major focus of research in plants and animals. This 
important area of inquiry emerged unexpectedly from 
experiments designed to produce a more colorful petunia. 

In the early 1990s, Richard Jorgensen and his col- 
leagues were attempting to deepen the color of petunias by 
introducing into the petunia genome a pigment-producing 
gene under the control of an active promoter. The re- 
searchers hoped that active transcription of this recom- 
binant gene would dramatically deepen flower color. To 
Jorgensen’s surprise, however, rather than exhibiting more 
intense color overall, many of the resulting flowers were 
variegated (see the chapter opener photo). Some flowers 
had stripes of deep pigment and stripes lacking pigment, 
and some flowers were almost entirely white. The re- 
searchers called this phenomenon cosuppression because 
expression of both the introduced pigment gene and the 
petunia’s natural pigment-producing gene was suppressed. 

By 1995, similar gene-silencing phenomena had been 
documented in numerous plant species, in the fungus 
Neurospora crassa, in the nematode worm Caenorhabditis 
elegans, and in the fruit fly Drosophila. The fundamental 
mechanism behind this form of regulation was identified 
in 1998 by a research team led by Andrew Fire and Craig 
Mello. Fire and Mello found that double-stranded RNA 
(dsRNA) molecules were taking part in a post-transcriptional 
regulatory mechanism now known universally as RNA 
interference (RNAi). Fire and Mello received the Nobel 
Prize in Physiology or Medicine in 2006 for their work. 


Gene Silencing by Double-Stranded RNA 


RNA interference silences gene expression either by block- 
ing transcription of targeted genes or by blocking gene 
expression post-transcriptionally. Post-transcriptional 
silencing occurs following binding of small regulatory 
RNAs to mRNA targets by complementary base pairing. 
The binding of these regulatory RNAs either can lead to 
the destruction of the target mRNAs or can block their 
translation. Alternatively, some regulatory RNAs enter 


the nucleus where they bind DNA to block transcription 
of targeted genes. Any of these regulatory processes first 
require that small regulatory RNA molecules use comple- 
mentary base pairing to bind their targets. 

The regulatory RNAs in RNAi are derived from vari- 
ous sources that produce double-stranded RNAs. An 
enzyme known as Dicer (Figure 15.24) cuts the double- 
stranded RNA into 21- to 25-bp fragments. These frag- 
ments are then bound by a protein complex called the 
RNA-induced silencing complex (RISC) that denatures 
the double-stranded RNAs into single strands of 21 to 25 
nucleotides. The RNA single strands produced by RISC 
are identified as the guide strand, which is biologically 
active, and the passenger strand, which is usually de- 
graded. The guide strand remains bound to RISC, and 
the complex directs one of three gene-silencing processes 
(numbers 1 through 3 in the figure): @ The complex uses 
complementary base pairing to attach the guide strand 
to mRNA, and the mRNA is destroyed; @ the RISC- 
guide RNA binds to complementary mRNAs and blocks 
their translation; or © the complex directs chromatin- 
modifying enzymes to the nucleus, where they silence 
transcription of selected genes. 

What is the origin of the dsRNA? It can be produced 
from endogenous genes or from the transcription of 
other endogenous sequences (e.g., transposons), or it can 
come from exogenous sources. In many eukaryotes, genes 
encode precursors of dsRNA that are processed into 
21- to 24-nucleotide microRNAs (miRNAs) at a Dicer 
complex (Figure 15.24 @). Most genes encoding miRNAs 
are transcribed by RNA polymerase II, and the resulting 
transcript folds back on itself into a dsRNA. The targets 
of miRNAs are endogenous mRNAs that are then either 
cleaved or have their translation blocked subsequent to 
activity mediated through RISC. 

Another type of dsRNA is small interfering RNA 
(siRNA). In contrast to miRNAs, siRNAs are usually 
not derived from genes but rather come from exogenous 
sources or from other endogenous transcription. For ex- 
ample, if both strands of a genomic region happen to 
be transcribed, dsRNA can form. Transcription from 
opposite strands of repetitive elements, such as transpo- 
sons, can also lead to dsRNA production @. In the lat- 
ter case, the two strands do not have to be derived from 
the same genomic location. Some eukaryotes possess 
RNA-dependent RNA polymerases, which can produce 
dsRNA using single-stranded RNA as a template. The en- 
dogenous sources of dsRNAs can direct either posttran- 
scriptional silencing, through the destruction of target 
mRNAs, or transcriptional silencing of target genes that 
takes place by chromatin modifying processes. Finally, 
exogenous sources of dsRNA can include RNA viruses @ 
that trigger virus-induced gene silencing. 


Cleaving dsRNA The general mechanism of action by 
which Dicer cleaves dsRNA into fragments of the proper 
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size was identified in 2006 when Jennifer Doudna and her 
colleagues determined the crystal structure of Dicer in the 
intestinal parasite Giardia intestinalis. Doudna’s research 
group used the crystal structure to determine that the 
dsRNA-binding site on Dicer, called PAZ, is separated 
by 65 A from the sites of two RNase domains that cut 
the RNA. The 65-A space between PAZ and the RNase 
domains corresponds to the 24-bp length of the resulting 
dsRNA fragments (Figure 15.25). Dicer repeats this action, 
each time behaving as a molecular ruler measuring off 
precisely sized dsRNAs. The spacing between the PAZ site 
and RNase domains varies among species and appears to 
correlate with species-specific differences in the lengths 
of siRNAs produced by subsequent RISC processing 
of dsRNAs. 


Transcription of 
microRNA genes 
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Figure 15.24 Gene silencing by RNAi. 
Dicer cuts dsRNA into 21- to 25-bp siRNA or 
miRNA segments that are then denatured 

by RISC. RISC-guide strand complexes can 
degrade targeted mRNAs, block translation of 
target mRNAs, or enter the nucleus to modify 
chromatin. 


Pre-miRNA 


Precursor transcripts of miRNAs and siRNAs are 
synthesized in the nucleus of a cell and are processed 
into miRNAs and siRNAs by Dicer activity. In the case 
of miRNA, the precursor transcript is called a primary 
microRNA (pri-miRNA). The pri-miRNA folds to 
form a double-stranded stem typically containing 65 to 
70 nucleotides and having free ends on one side and a 
single-stranded loop on the other side (Figure 15.26). In 
animals, the Drosha enzyme complex cuts pri-miRNA 
near the middle of the stem and produces two seg- 
ments, one of which, now called precursor microRNA 
(pre-miRNA), contains the remainder of the upper stem, 
which is approximately 21 to 25 bp, and the terminal 
loop. The pre-miRNA is transported to the cytoplasm, 
where Dicer removes the terminal loop, leaving dsRNA 
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Figure 15.25 Dicer structure and interaction with dsRNA. 
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of approximately 21 to 25 bp. RISC then binds the 
dsRNA and separates the strands to create miRNAs. 
The creation of siRNA is similar. In contrast to animals, 
plants use a single Dicer enzyme to perform all the 
miRNA processing activities. 
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Figure 15.26 Stepwise processing of pri-miRNA to produce 
miRNA. 


RISC and Argonaute The newly produced siRNA or 
miRNA remains bound by RISC to act as a guide strand. 
Within the RISC multiprotein complex is a protein of the 
Argonaute gene family that plays a central role in how 
the RISC—guide strand silences gene expression. Many 
species encode multiple Argonaute proteins—humans 
encode eight, for example—and each seems to direct a 
somewhat different activity by RISC—guide strand. 

The best-understood mechanism of gene silencing 
by RISC-—guide strand involves complementary binding 
of the guide strand to a target mRNA. If the percent- 
age of base-pair complementation is high enough, this 
binding forms a structure that allows an RNase domain 
of Argonaute to cut the targeted mRNA strand near 
the middle of the guide strand—mRNA duplex, thus 
causing cleavage of the mRNA. When the guide strand- 
mRNA base pairing is less well matched—that is, when 
only a core of complementary base pairs are present in 
the guide strand-mRNA duplex—the RNase domain 
of Argonaute is unable to cut the duplex. Instead, the 
duplex retains its double-stranded form, causing transla- 
tion to be blocked. 


Chromatin Modification by RNAi 


For the third mechanism by which the RISC-guide 
strand complex silences gene expression, we return to 
chromatin modification. Details of how small RNAs con- 
tribute to the maintenance of heterochromatin were 
worked out in the yeast Schizosaccharomyces pombe. The 
first evidence of a role for RNAi in chromatin modifica- 
tion came from the study of centromeric heterochroma- 
tin in S. pombe. The centromeres of S. pombe, like those 
of other complex eukaryotes, contain a central element 
surrounded by repeat sequences (see Figure 11.17). The 
histones in the centromeric region have a low level of 
acetylation, and lysine 9 of the N-terminal tail of H3 
(that is, H3K9) is methylated. Both types of modification 
are consistent with the formation of a closed chromatin 
structure and the spread of heterochromatin to silence 
nearby genes. 

S. pombe possesses single genes for Dicer and for 
Argonaute, and mutation of either gene disrupts RNAi 
activity in the cell. The surprising finding, however, was 
that S. pombe with Dicer or Argonaute mutations also 
lacks methylation of H3K9 and does not have gene silenc- 
ing around the centromere. The explanation for these 
additional deficiencies is that in S. pombe, both strands 
of the centromeric repeat sequences are transcribed by 
RNA polymerase II. The resulting mRNAs are comple- 
mentary and form double-stranded RNAs that Dicer cuts. 
The fragments produced by this process are then sepa- 
rated into single strands that bind to Argonaute, which 
then joins a protein known as Chp1 and other proteins 
to form a RISC-like complex called the RNA-induced 
transcriptional silencing (RITS) complex (Figure 15.27) 
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Figure 15.27 RNA-induced transcriptional silencing (RITS) 
in yeast. 


that carries the siRNA into the nucleus. The siRNA-RITS 
complex is attracted to the centromere, where the siRNA 
appears to use complementary base pairing to form a 
duplex with nascent transcripts of the centromeric repeat 
sequences. This pairing attracts other proteins that pro- 
mote the deacetylation of histones and the methylation of 
H3K9 to close the chromatin structure and spread hetero- 
chromatin outward from the centromere. 
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The Evolution and Applications of RNAi 


RNAi is widespread in eukaryotes, and the mechanism of 
transcriptional silencing in S. pombe is thought to be re- 
lated to RNAi-mediated transcriptional silencing in other 
eukaryotic species. But how did RNAi evolve? The answer 
is still under investigation, but the operating hypothesis 
is that RNAi evolved by helping organisms protect their 
genomes against the mutational effects of transposable 
genetic elements (described in Chapter 13). 

Transposable elements are diverse and make up 
large percentages of the genomes of complex eukaryotes. 
For example, almost half the human genome is com- 
posed of transposable elements. In the human genome 
and in other eukaryotic genomes, most of these trans- 
posons are located in heterochromatin and are silent; 
however, researchers have discovered that mutations in 
the RNAi machinery of an organism can reactivate nor- 
mally quiescent transposons by reversing transcriptional 
silencing. This can lead to the movement of some trans- 
posable elements around the genome and potentially to 
the production of new mutations. The evidence suggests 
that RNAi plays a role in silencing the transcription of 
transposons. 

RNAi also plays a protective role in response to 
viral infection. In plants, the infection of one leaf by a 
virus can generate an RNAi response that blocks viral 
replication and prevents the infection from spreading 
throughout the plant. In support of this observation, 
plants with Dicer or Argonaute mutations are much 
more susceptible to the spread of viral infections than 
are plants without Dicer or Argonaute mutations. These 
findings are consistent with the idea that RNAi evolved 
as a genome-protection mechanism against transpos- 
able genetic elements and viral infection. To return 
to Jorgensen’s petunias and their cosuppression for a 
moment, biologists now know that RNAi is responsible 
for blocking expression of the chromosomal pigment- 
producing gene as well as the introduced copy of the 
pigment-producing gene. 

Both plants and animal genomes encode miRNAs, 
but the mode of action of miRNAs differs slightly between 
the two taxa. In plants, miRNAs display near-complete 
sequence complementarity with their mRNA targets and 
usually cleave the target rather than block translation. In 
contrast, miRNAs in animals are usually only comple- 
mentary to their targets at one end of the miRNA and 
usually repress translation rather than cleave the target. 
These differences suggest that miRNAs may have evolved 
independently in the two lineages. 

RNAi is emerging as a powerful research tool that 
can be used in a multitude of ways. One frequent applica- 
tion of RNAi in research is the use of interfering RNAs 
to “knock down” the expression of selected genes. This 
is a way of discovering the gene’s effect on the phenotype 
by examining how phenotype is altered in the absence 
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of expression of the gene. A second area for application 
of RNAi is in medicine, where biomedical researchers 
are exploring the possible uses of RNAi to control the 
expression of genes that produce too much transcript 
or produce abnormal transcripts in disease. In certain 
cancers, for example, the disease process is driven in part 
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Here’s a simple question: How are traits passed from one 
generation to the next? The first answer that came to your 
mind was probably (and not incorrectly) that traits are passed 
by the transmission of genes from parents to offspring. But 
over the past decade or so, the answer to that question has 
expanded in an unexpected direction. Emerging evidence 
suggests that in certain cases, parental nutrition and diet 
may lead to epigenetically controlled modifications of gene 
expression and that in a few select instances, the affected 
genes can be transmitted to offspring in their epigenetically 
modified form. More surprisingly, the data also indicate that 
the epigenetically modified state of the genes may persist in 
later generations. In other words, it may be possible for the 
nutritional experience of grandparents to affect gene expres- 
sion in their grandchildren! 


HONEYBEE DESTINY Three lines of evidence suggest a role 
for nutrition and dietary history in the epigenetic modifica- 
tion of gene expression. The first comes from studies in hon- 
eybees, where it has been shown that genetically identical 
larvae can develop into either fertile queens or sterile work- 
er bees following differential feeding with royal jelly, the 
compound fed to larvae that become queens. Experimental 
analysis led by Ryszard Maleszka in 2008 reveals that silenc- 
ing the expression of the DNA methyltransferase Dnmt3 by 
knocking down translation of the Dnmt3 transcript by RNA 
interference leads to the development of fertile queens. In 
other words, blocking a major histone methylation pathway 
led to the expression of genes that are typically expressed 
only when a larva is fed royal jelly. The implication is that 
methylation is an important epigenetic mechanism for re- 
pressing gene expression and directing the development of 
worker bees. Methylation and the resulting transcriptional 
repression are subverted by feeding royal jelly to produce 
the development of fertile queen bees. 


EVIDENCE IN MICE The second line of evidence comes from 
multiple studies of the connection between environmen- 
tally generated methylation of genes and variation in gene 
expression in rats and mice. In one study, genetically identi- 
cal mice carry a modified agouti gene that produces yellow 
coat color and extreme obesity when the gene is expressed, 
whereas the normal brown coat color and normal body 
weight are produced if the modified gene is not expressed. 
The coat color and body weight of genetically identical 


by overexpression of certain genes. RNAi therapy would 
involve designing and constructing small RNA molecules 
that specifically bind and block the translation of the tran- 
scripts of disease-causing genes while not affecting the 
transcripts of other genes. We discuss other experimental 
applications of RNAi in Chapter 16. 


mouse pups carrying this modified gene are determined by 
the diet of the mother in the weeks before impregnation and 
during pregnancy and lactation. 

In controlled experiments, mothers that will transmit 
the modified agouti gene to their pups are fed either a diet 
enriched with three compounds that each act as donors 
of methyl groups to DNA— folic acid (vitamin B42), choline 
chloride, and anhydrous betaine—or a diet without these 
compounds. The controlled dietary period begins 2 weeks 
before mating and continues through pregnancy and lac- 
tation. The pups produced are genetically identical, and 
after they are weaned, they are all fed the same diet. At 
3 weeks of age, however, the appearance of the pups is 
dramatically different. Mice produced by mothers who were 
fed the enriched diet have brown coat color and normal 
body weight, whereas genetically identical mice produced 
by mothers not fed the enriched diet have yellow coat color 
and are obese. The difference indicates that the modified 
agouti gene is expressed when it is transmitted from moth- 
ers that were not fed the diet enriched with methyl donors. If 
the modified gene is transmitted from mothers receiving the 
enriched diet, however, the modified agouti gene is methyl- 
ated and silenced. 


INHERITANCE OF FAMINE EFFECTS The third line of 
evidence comes from an unfortunate event during World 
War Il. A severe famine occurred in German-occupied Neth- 
erlands between November 1944 and May 1945. The famine 
reduced daily caloric intake to 500 to 800 calories per day, 
much less than the body needs to fuel its normal metabolic 
activities. Long-term studies have been performed on Dutch 
people who were conceived or born during the famine and 
on their descendants. Studies of the health effects of the fam- 
ine find that so-called famine babies were often born severely 
underweight. As the famine babies grew into adults and aged, 
they suffered increased risk of cardiovascular disease, diabe- 
tes, and obesity compared to peers who had not been affected 
by the famine. The proposed explanation is that the restricted 
nutritional conditions in the womb caused alterations of gene 
expression, producing an energetically “thrifty” metabolism. 
More surprising, however, was that among the children of the 
famine babies, there is also an elevated risk of cardiovascular 
and other diseases. The explanation proposed for this second- 
generation effect is epigenetic modification of gene expres- 
sion that is transmitted through multiple generations. 


A 2008 study by Bastiaan Heijmans on the methylation 
pattern of the /GF2 gene on chromosome 15 confirms the 
epigenetic control mechanism that we discussed previously in 
connection with genomic imprinting, Prader-Willi syndrome, 
and Angelman syndrome. Heijmans and colleagues found that 
IGF2 in certain famine babies (now in their 60s) still bears the 
marks of famine. The /GF2 genes of those exposed to famine 


SUMMARY 


15.1 Cis-Acting Regulatory Sequences Bind 
Trans-Acting Regulatory Proteins to Control 
Eukaryotic Transcription 


Regulatory proteins in eukaryotes bind to specific nucleo- 
tides exposed in major and minor grooves of DNA. 


Promoters, proximal elements, and enhancers are cis-acting 
DNA sequences that bind trans-acting regulatory proteins to 
regulate transcription. 

Enhancer sequences are strongly conserved, indicating they 
perform essential functions. 

Upstream activator sequences (UAS) in yeast are enhancer- 
like elements that regulate the expression of genes such as 
those involved in galactose utilization. 

Locus control regions (LCRs) are specialized enhancers that 
control the sequential expression of sets of genes such as 
those in the developmentally regulated human $-globin gene 
complex. 

Silencer sequences bind repressor proteins to block tran- 
scription of targeted genes. 

Insulators block enhancer influence on certain genes and 
direct that influence to other genes. 


15.2 Chromatin Remodeling and Modification 
Regulates Eukaryotic Transcription 


Open promoters are constitutively transcribed, whereas 
transcription from covered promoters is regulated. 

In regions of closed chromatin structure, the DNA is wound 
tightly around nucleosomes. These regions are transcrip- 
tionally silent. 
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during the first 10 weeks of gestation are marked by signifi- 
cantly fewer methyl groups than are the genes of their same- 
sex siblings not exposed to famine conditions. These results 
support the idea that prenatal conditions can impart specific 
epigenetic patterns to genes and that environmental factors 
contributing to epigenetic patterns may play an important role 
in modifying gene expression over multiple generations. 


For activities, animations, and review quizzes, go to the Study Area. 


In regions of open chromatin structure, the association 
of DNA and nucleosomes is looser, allowing genes to be 
expressed. 
| Chromatin-remodeling complexes displace nucleosomes 
to allow transcription initiation by RNA pol II and general 
transcription factors. 
Chromatin is modified by writers and erasers, and read 
by readers. Writers and erasers are recruited by transcrip- 
tion factors to open and close the chromatin by adding and 
removing acetyl and methyl groups at specific amino acids 
in the N-terminal tails of histone proteins. 
Epigenetic states of chromatin are heritable in somatic cells 
that divide by mitosis and may be reset in germ-line cells 
that divide by meiosis. 


Genomic imprinting in mammalian genomes involves nucle- 
otide methylation and the action of enhancer and insulator 
sequences. 


15.3 RNA-Mediated Mechanisms Control Gene 
Expression 


f RNA interference (RNAi) is an RNA-mediated mechanism 
for regulating gene expression in eukaryotes. 

E Small interfering RNAs (siRNAs) and microRNAs (miRNAs) 
are principal regulatory RNA molecules. 

E The Dicer protein complex processes dsRNAs into their 
regulatory form. 


E The RISC complex carries regulatory RNAs to RNAs 
targeted for destruction or for blockage of translation. 


I A specific form of regulatory RNA directs mammalian 
X-inactivation. 


Argonaute (p. 526) 
chromatin modifier (p. 517) 
chromatin remodeler (SWI/SNF, ISWI, 


chromatin remodeling (p. 514) 
cis-acting regulatory sequence (p. 507) 
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genomic imprinting (p. 522) long noncoding RNA (IncRNA) (p. 521) RNA interference (RNAi) (p. 524) 
guide strand (p. 524) Mediator (p. 511) silencer sequence (p. 506) 
histone acetyltransferase (HAT) (p. 517) microRNA (miRNA) (p. 524) small interfering RNA (siRNA) 
histone deacetylase (HDAC) (p. 517) nucleosome-depleted region (NDR) (p. 514) (p. 524) 
histone demethylase (HDMT) (p. 519) open chromatin (p. 514) Su(var) mutation (p. 512) 
histone methyltransferase (HMT) (p. 519) open promoter (p. 514) SWI/SNF complex (p. 516) 
imprinting control region (ICR) (p. 522) RNA-induced silencing complex (RISC) SWRI1 complex (p. 516) 
insulator sequence (p. 511) (p. 524) trans-acting regulatory protein (p. 507) 
ISWI complex (p. 516) RNA- induced transcriptional silencing upstream activator sequence (UAS) 
locus control region (LCR) (p. 508) (RITS) complex (p. 526) (p. 510) 

PROBLEMS Eí MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 
Cha pter Concepts For answers to selected even-numbered problems, see Appendix: Answers. 
1. Devoting a few sentences to each, describe the following 7. What are the roles of the Polycomb and Trithorax com- 

structures or complexes and their effects on eukaryotic plexes in eukaryotic gene regulation? 


cide, Saale 8. Most biologists argue that the regulation of gene expres- 


a. promoter sion is considerably more complex in eukaryotes than in 
b. enhancer bacteria. List and describe the four factors that in your 
A -i view make the largest contribution to this perception. 

" Dicer 9. Compare and contrast the transcriptional regulation of 


: , , GAL genes in yeast with that of the lac genes in bacteria. 
2. Describe and give an example (real or hypothetical) of each 


of the following: 10. The term heterochromatin refers to heavily condensed 
regions of chromosomes that are largely devoid of genes. 
Since few genes exist in those regions, they almost never 
decondense for transcription. At what point during the cell 
cycle would you expect to observe the decondensation of 
heterochromatic regions? Why? 


upstream activator sequence (UAS) 
insulator sequence action 

silencer sequence action 
enhanceosome action 

RNA interference 


oanp 


11. Compare and contrast promoters and enhancers with respect 
to their location (upstream versus downstream), orientation, 
and distance (in base pairs) relative to a gene they regulate. 


3. What is meant by the term chromatin remodeling? 
Describe the importance of this process to transcription. 


4. What general role does acetylation of histone protein 


amino acids play in the transcription of eukaryotic 12. How are the different types of chromatin classified, and 
genes? what is their relationship with gene expression? 
5. Describe the roles of writers, readers, and erasers in 13. Define epigenetics, and provide examples illustrating your 
eukaryotic gene regulation. definition. 
6. Outline the roles of RNA in eukaryotic gene regulation. 14. What is one proposed role for IncRNAs? 
Application and Integ ration For answers to selected even-numbered problems, see Appendix: Answers. 
15. A hereditary disease is inherited as an autosomal 1 2 
recessive trait. The wild-type allele of the disease gene l O 
produces a mature mRNA that is 1250 nucleotides (nt) 
long. Molecular analysis shows that the mature mRNA ; a1, 3 z 
consists of four exons that measure 400 nt (exon 1), il © H O 
320 nt (exon 2), 230 nt (exon 3), and 300 nt (exon 4). 
A mother and father with two healthy children and two 
children with the disease have northern blot analysis l-1 l-2 Il-1 _ Il-2 _Il-3 _Il-4 
performed in a medical genetics laboratory. The re- 1250 
sults of the northern blot for each family member are nt Northern 
shown below. 1020 | =e = ee ce cee blot 


regions 


a. Identify the genotype of each family member, using the 
sizes of mRNAs to indicate each allele. (For example, 
a person who is homozygous wild type is indicated as 
“1250/1250”) 

b. Based on your analysis, what is the most likely molecu- 
lar abnormality causing the disease allele? 


16. The UG4 gene is expressed in stem tissue and leaf tissue 


of the plant Arabidopsis thaliana. To study mechanisms 

regulating UG4 expression, six small deletions of DNA 

sequence upstream of the gene-coding sequence are made. 

The locations of deletions and their effect on UG4 expres- 

sion are shown below. 

a. Explain the differential effects of deletions B and F on 
expression in the two tissues. 

b. Why does deletion D raise UG4 expression in leaf tissue 
but not in stem tissue? 


Transcription 


, Promoter start 
Upstream region region — 
__|UG4gene 
ia a me em tome 
Deletion E D A C B F 
Transcription (%) 
Deletion Stem Leaf 
None (control) 100 100 
A 100 100 
B <1 92 
C 100 100 
D 100 163 
E 98 <1 
F 100 100 


c. Why does deletion E lower expression of UG4 in leaf 
tissue but not in stem tissue? 


17. A gene expressed in long muscle of the mouse is identified, 


and the regulatory region upstream of the gene is isolated. 
Various segments of the upstream sequence are fused to 
the lacZ gene, and each fusion is assayed to determine how 
efficiently it transcribes the gene. In the accompanying 
diagram, the dark bars indicate the upstream segments that 
are present in each of six different fusion genes. The tran- 
scriptional efficiency of each fusion is measured against the 
control fusion, that is, the full-length upstream segment 
fused to the lacZ gene. 


Full-length 
. upstream region [ 7” 
Fusion LacZ 
gene 
Fusion Fused segment Transcription (%) 
Control 
(full-length) SEES 100 
A — 6 
B _ 0 
E -_ -_ 8 
D —_ 0 


18. 
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a. Identify the upstream region that contains the 
enhancer. 

b. Identify the upstream region containing the 
promoter. 

c. Speculate about the reason for the different transcrip- 
tion rates detected in fusions E and F. 


The consequences of four deletions from the region up- 
stream of the yeast gene DBM 1 are studied to determine 
the effect on transcription. The normal rate of transcrip- 
tion, determined from study of transcription of genes that 
do not have upstream deletions, is defined as 100%. The 
location of each deletion and the effects of deletions on 
DBM1 transcription are shown below. 


a. Which mutations(s) affect an enhancer sequence? 
Explain your reasoning. 

b. Which mutation(s) affect a silencer sequence? Explain 
your reasoning. 

c. Which mutation(s) affect the promoter? Explain your 
reasoning. 


Transcription 


Upstream region stari 
DBM1 gene 
ea oman waa 
Deletion A B C D 
regions 
Deletion Transcription (%) 
None 100 
(control) 
A 7 
B 155 
C 51 
D <1 
19. Provide a description of the mechanistic roles of transcrip- 


20. 


tion factors and chromatin-modifying and chromatin- 
remodeling enzymes in the control of eukaryotic gene 
expression. 


A muscle enzyme called ME1 is produced by transcrip- 
tion and translation of the MEI gene in several muscles 
during mouse development, including heart muscle, in 

a highly regulated manner. Production of ME1 appears 
to be turned on and turned off at different times during 
development. To test the possible role of enhancers and 
silencers in ME1 transcription, a biologist creates a re- 
combinant genetic system that fuses the ME1 promoter, 
along with DNA that is upstream of the promoter, to the 
bacterial lacZ (B-galactosidase) gene. The lacZ gene is 
chosen for the ease and simplicity of assaying production 
of the encoded enzyme. The diagram shows the struc- 
ture of the recombinant, as well as bars that indicate the 
extent of six deletions the biologist makes to the ME1 
promoter and upstream sequences. The blue bar is the 
site of the promoter whereas the gray bars span potential 
enhancer/silencer modules. The table displays the percent- 
age of B-galactosidase activity in each deletion mutant in 
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comparison to the recombinant gene system without any 21. A muscle protein in mouse is produced through the use of 
deletions. alternative promoters in heart and skeletal muscle. A dia- 
MEI MEI gram of the gene region is shown below. The gene contains 
Upstream region Promoter LacZ gene a total of six exons, and there are three restriction sites 
—— ere es recognized by the restriction enzyme HindIII in the vicinity 
i of the gene. In the diagram, the locations of heart (Py) and 
— skeletal (Ps) promoters are indicated, as are two molecular 
£ B = ! probes. Probe a hybridizes to exon 2, and probe b hybrid- 
E C | izes to exon 4. Transcription of the gene in heart and skel- 
g ie D etal muscle terminates after exon 6. The protein produced 
E i by the gene is recognized in heart and muscle samples by 
A the same antibody. 
LacZ 
Deletion activity (%) HindIII Hindlll HindIII 
Ph Ps 
None (control 100 = 
a | ie T E E e a 
B 100 Exon 1 2 3 4 5 6 
C 4 Probe a b 
D <1 
E 170 
F 5 Diagram the expected results of the studies described 
below. 

a. Does this information indicate the presence of en- a. HindIII digestion of DNA from heart muscle and 
hancer and/or silencer sequences in the ME1 upstream skeletal muscle followed by the use of probes a and b 
sequence? If so, where is/are the sequences located? in Southern blot analysis, with each of the probes in a 

b. Why does deletion D effectively eliminate transcription separate analysis 
of lacZ? b. Northern blot analysis of mature mRNA extracted from 

c. Given the information available from deletion analysis, heart muscle and skeletal muscle, using probes a and b 
can you give a molecular explanation for the observa- in separate analyses 
tion that MEJ expression appears to turn on and turn c. Western blot analysis of the protein from heart muscle 


off at various times during normal mouse development? and skeletal muscle, using the antibody as a probe 


Analysis of Gene Function 
by Forward Genetics and 
Reverse Genetics 


CHAPTER OUTLINE 


16.1 Forward Genetic Screens 
Identify Genes by Their Mutant 
Phenotypes 

16.2 Genes Identified by Mutant 
Phenotype Are Cloned Using 
Recombinant DNA Technology 

16.3 Reverse Genetics Investigates 
Gene Action by Progressing 
from Gene Identification to 
Phenotype 

16.4 Transgenes Provide a Means of 
Dissecting Gene Function 


Thomas Hunt Morgan’s fly room (he is at far right, back row) was the site of 
the original mutagenesis. The first screens were limited by their reliance on 
spontaneous mutants, but the discovery by Hermann Muller (second from 
right. back row) that X-rays are mutagenic turned genetic screens into rou- 
tine and powerful tools to uncover gene function. Also visible in this photo 
are Calvin Bridges (third from left, back row), who used nondisjunction to 
prove the chromosome theory of heredity, and Alfred Sturtevant (middle 
front row), who constructed the first genetic map. 


aoe goal of biology is to understand the molecular 
and genetic bases of physiology and development. 
Beginning with Mendel and resuming in the first part of 

the 20th century, geneticists attempted to dissect the rules 
of heredity by connecting phenotypes to genetic loci. The 
discovery of DNA as the hereditary material indicated that 
genes are specific DNA sequences and that allelic differences 
reflect differences in those sequences. In the 1970s, discover- 
ies stemming from the study of bacteria and their phages led 
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to the development of tools to manipulate DNA 

in vitro. With these tools, collectively referred to as 
recombinant DNA technology, geneticists could for 
the first time obtain the precise DNA sequences of 
specific genes and alleles, thus identifying the mo- 
lecular basis of phenotypic differences. 

The exploration of how genes control physiolog- 
ical and developmental processes is approached in 
two ways that attack the problem from diametrically 
opposite directions. These opposite approaches are 
known as forward genetic analysis and reverse genetic 
analysis. The goals of forward and reverse analysis 
are the same: to identify the genes responsible for 
hereditary variation, to determine the structure and 
function of wild-type alleles controlling traits, and to 
describe how mutant alleles generate abnormal phe- 
notypes. However, the two strategies begin at differ- 
ent ends of the process of gene identification. 

Forward genetic analysis starts with a genetic 
screen that identifies specific phenotypic abnormal- 
ities in a population of organisms that have been 
mutagenized—mutagenesis being the intentional 
introduction of mutations into the genome of an 
organism. The abnormal phenotype is then studied 
to identify the nature of the hereditary abnormality 


(a) Forward genetics 


and, by inference, the normal functions of an asso- 
ciated gene. Ultimately, the sequence of the gene 
responsible for the abnormality is determined and 
may suggest the molecular function of the corre- 
sponding gene product (Figure 16.1a). In contrast to 
forward genetics approaches, which begin genetic 
investigation with a mutant phenotype and pro- 
ceed toward the identification of a gene sequence, 
reverse genetics approaches begin with a gene 
sequence and seek to identify the corresponding 
mutant phenotype (Figure 16.1b). In a reverse genet- 
ics experiment, loss-of-function alleles of specific 
genes are created by a variety of approaches, and 
the resulting phenotypes are examined to see how 
they differ from the wild type. Reverse genetic 
analysis has risen to prominence as a result of the 
enormous quantity of DNA sequence data made 
available since the late 1990s. 

In this chapter, we discuss forward and reverse 
genetic analyses from a conceptual viewpoint and 
in Chapter 17 present details of how recombinant 
DNA technology can be used to manipulate DNA 
sequences in vitro and in vivo. 


@ Mutagenize @ Identify Ultrabithorax 
flies and screen gene ATG AAC TCG TAC 
à for aberrant caf. sequence. MTG Caa GCC © ae 
= N phenotypes. Za Y y. 
;}——— WN TCC GGC TIT TAT] + ` molecular 
4 TTA GAT CAG TAG function. 
Wild type Ultrabithorax mutant 


(b) Reverse genetics 


Hox 10 


Wild type Hox 10 mutant 


(1) Isolate mouse ATG ACG GGG AAA © Generate ATG ACG GGG AAA © Identify A 
gene similar to mutant mutant N f 
Drosophila GCG GGG GAA GCG allele. GCG GGG GAA GCG phenotype. eee 4 


Ultrabithorax —> CTG AGC AAG CCC || 


CTG AGC MAG CCC // 


gene. GAC ATG GCT TAG 


Figure 16.1 
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General strategies of forward and reverse genetics. 
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16.1 Forward Genetic Screens Identify 
Genes by Their Mutant Phenotypes 


With the discovery by Hermann Muller that ionizing 
radiation induces mutations (see Section 12.3), geneti- 
cists realized that mutant organisms could be generated 
at will and systematically screened for phenotypes of 
interest. Mutant phenotypes provide information on the 
function of the wild-type allele and insight into biological 
processes. The earliest example of this logic is the work 
of Archibald Garrod, who in 1908 connected the human 
autosomal recessive hereditary condition alkaptonuria to 
the lack of a specific biochemical activity, the metabolism 
of benzene rings in homogentisic acid (see Chapter 9). He 
suggested that the wild-type version of the gene encodes 
the enzyme responsible for this biochemical activity. 
After Muller brought the mutagenic powers of X-rays to 
their attention (see Section 12.3), geneticists began to em- 
ploy systematic genetic screens to dissect other biological 
processes, and the genetic bases for entire biochemical 
pathways were elucidated. 

The designing of genetic screens to identify genes 
involved in specific biological processes is limited only 
by the imagination of the geneticist. An example is the 
research by Seymour Benzer that led to the field of be- 
havioral genetics in the 1970s. Benzer believed muta- 
tions could be identified that specifically affect behavioral 
processes, such as one you are using now, the process of 
learning and memory. At the time, behavior was thought 
by many to be too complex to be dissected genetically. 
However, Chip Quinn, a graduate student in Benzer’s lab, 
built on previous ideas and designed an ingenious screen 
to identify learning- and memory-deficient mutants in 
Drosophila. Wild-type flies could be taught that a pulse 
of odor would be followed by a shock; later, when the flies 
smelled the odor, they would take evasive action. When 
Quinn and Benzer subjected a mutagenized population of 
Drosophila to this genetic screen, they identified mutant 
strains of flies that could perceive the odor but seemed 
unable to associate the odor with the stimulus; either they 
did not learn or could not remember. 

Two mutant genes identified in the study, dunce and 
rutabaga, were later shown to encode proteins involved 
in the production or degradation of the small signaling 
molecule cyclic adenosine monophosphate (cAMP). At 
the time, signaling via a cAMP pathway was known to be 
required for learning in the sea hare, Aplysia. Since both 
Drosophila mutants were defective in cAMP physiology, 
other genes that encoded proteins involved in cAMP 
signaling and response were also investigated for roles 
in learning. Ultimately, a transcription factor called creb 
(cAMP response element-binding protein), which acti- 
vates or represses genes in response to cAMP signaling, 
was shown to be critical for storing memories in flies. 
Remarkably, creb is widely conserved in animal species, 


and mouse mutants lacking creb activity also fail to re- 
member. A similar gene is found in our genome. 

A great strength of forward genetic screens is that 
they are unbiased; no prior knowledge of the molecular 
function of the encoded gene product is required. In a 
sense, by performing a mutagenesis, the geneticist is al- 
lowing the organism to reveal how its biological processes 
operate. Once genes in particular physiological or devel- 
opmental processes have been identified by mutation, 
clues to the molecular function of the gene product can be 
obtained using recombinant DNA technology. 


General Design of Forward Genetic Screens 


Forward genetic screens often require the mutagenesis 
of thousands of individuals, followed by screening large 
numbers of their progeny for mutant phenotypes. Each 
progeny may contain multiple mutations, but only a small 
fraction of the progeny will have a mutant phenotype of 
interest. For example, in their screens to identify auxo- 
trophs, Beadle, Tatum, and colleagues screened many 
thousands of individual mutant lines to find the few argi- 
nine auxotrophs that were produced. While some screens 
necessitate the visual inspection of all progeny, others 
are specifically designed to highlight certain mutants of 
interest against the background of all other mutants. The 
designing of such screens is an art. 

Perhaps the most dramatic screen is one in which 
application of a simple selection technique allows mu- 
tants of interest to survive while those not of interest 
die. Examples include the isolation of bacteria resistant 
to antibiotics, insects resistant to insecticides, and plants 
resistant to herbicides. Similarly, isolation of mutants 
resistant to analogs of cellular chemicals or to high levels 
of naturally occurring hormones has proven useful in 
genetic screens. Often in such cases, mutations identify 
genes encoding proteins involved in the metabolism or 
signaling pathways of the respective chemicals. 

Even when strong selection criteria cannot be applied, 
knowledge of the biological process of interest can influence 
the design of the screen. For example, when Wieschaus and 
Niisslein-Volhard performed their screen for Drosophila 
embryogenesis mutants, they assumed that the mutations 
of interest were all likely to be lethal to the larva (see 
Section 20.2). Thus they could limit their intensive analysis 
to mutant lines in which larval lethality was evident. 


Specific Strategies of Forward Genetic Screens 


Forward genetic screens begin with a mutagenesis—an 
organism is treated with a mutagen to create mutations ran- 
domly throughout the genome. A typical goal is to induce 
mutations in every gene in a population of mutagenized 
individuals, by an approach called saturation mutagenesis. 
The mutagenized population is then screened for pheno- 
typic defects in whatever biological process is being studied, 
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and the mutants are collected and propagated for further 
analysis. Strategies for mutagenesis depend on the bio- 
logical process of interest, which dictates the experimental 
organism to use, the choice of mutagen, and the screening 
procedure to identify mutations. 


Choosing an Organism The attributes that make an 
organism a good genetic model also make it a good choice 
for a mutagenesis experiment (see back end sheets): An 
organism must be able to progress through its entire 
life cycle in the laboratory, have a short generation time 
(for eukaryotic models, the time it takes to produce 
sexually mature progeny and complete the sexual life 
cycle), produce a reasonable number of progeny, and be 
amenable to crossing. Organisms that are diploid must 
have a starting genotype (the genotype to be mutagenized) 
that is inbred—in other words, homozygous at all loci. This 
genotype allows newly induced mutations to be readily 
identified, without interference from the confounding 
effects of polymorphisms. Finally, it is advantageous to use 
the simplest organism possible for the biological process 
under study. Because Saccharomyces cerevisiae has a rapid 
life cycle and is easily manipulated in the laboratory, it is 
often used to investigate biological processes common to 
all eukaryotes. The principles elucidated in S. cerevisiae can 
often be extended to other eukaryotes, including humans. 


Choosing a Mutagen The choice of mutagen is dictated 
by both the organism and the type of mutant alleles 
desired; different mutagens have different advantages and 
disadvantages (Table 16.1). Mutagens inducing different 
types of changes in DNA sequences were described in 
Section 12.4. Treatment with chemical mutagens can 
induce hundreds of mutations in a single individual, 
allowing saturation to be reached with only a few thousand 
mutagenized individuals. However, the cloning of genes 
identified by chemical mutagenesis can be laborious. In 
contrast, mutagens that result specifically in insertions of 
DNA, such as transposons, result in far fewer mutations per 


Table 16.1 
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individual, making saturation difficult. But these mutagens 
have the advantage of being able to provide a DNA “tag” 
that facilitates finding and cloning the mutated genes. In 
all mutageneses used for forward genetic screens care must 
be taken to outbreed mutants of interest by crossing them 
with the wild-type progenitor strain, thus ensuring that the 
collected mutant lines have only the mutation of interest and 
not others that were also induced during the mutagenesis. 


Strategy for Identifying Dominant and Recessive 
Mutations The overall goal of mutagenesis is to identify 
multiple independent mutant alleles of each gene involved 
in the biological process of interest. Let us consider the 
identification of dominant and recessive mutations in a 
typical animal example. 

Most animals spend most of their life cycle in the 
diploid state. Their germ cells are set aside early in de- 
velopment and do not contribute to the somatic develop- 
ment of the remainder of the animal body. When animals 
are treated with a mutagen—for example, by feeding 
males ethyl methanesulfonate (EMS), a potent mutagen 
that causes a spectrum of alleles (see Table 16.1)—only 
the mutations induced in the germ cells are heritable and 
will be passed to the progeny of the mutagenized animals. 

Newly induced dominant mutations can be identi- 
fied in the F; generation that is produced by breeding the 
mutagenized males with wild-type females (Figure 16.2a). 
However, only a small fraction of the F, progeny will 
exhibit a mutant phenotype, since dominant mutations 
are rare. This rarity is due to the low probability that any 
change in the DNA sequence of a gene will produce a gain 
in function for the encoded gene product, either qualita- 
tively or quantitatively. 

Mutations that result in a loss of function are more 
common, but loss-of-function mutations are usually reces- 
sive and do not result in an observable phenotype in the 
F; generation. Therefore, further breeding must be per- 
formed to produce homozygous loss-of-function mutants. 
Specifically, recessive mutations are identified in an F3 
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Figure 16.2 Mutagenesis strategies. 


screen (Figure 16.2b). In this screen, each F, individual de- 
rived from the mating of mutagenized males with wild-type 
females carries unique mutations. The F, individuals are 
then crossed with wild-type females, producing an F, gen- 
eration in which 1/2 of the individuals will carry the newly 
induced mutations. The F, siblings are interbred, produc- 
ing an F3 population segregating for individuals that are 
homozygous for the induced mutation. The interbreeding 
of the F, to produce homozygous mutant F; is inefficient, 
since only half of the F, are heterozygous for the induced 
mutation. Nonetheless, such mutagenesis strategies are 
employed with many species, such as mice and zebrafish. 

Identification of recessive mutations is some- 
what simpler in organisms that self-fertilize, such as 
Caenorhabditis elegans and many plants (e.g., Arabidopsis 
and maize). In these organisms, F, individuals are self- 
fertilized to produce an F, generation from which 
recessive mutations can be identified. An example of an 
F, screen is shown in Figure 16.2c. In either an F, or F3 
screen, mutations resulting in homozygous lethality can 
be maintained in heterozygous siblings. 


Use of Balancer Chromosomes for Tracking Mutations 
The inefficiency of an F3 screen can be circumvented using 
chromosomes that are marked so they can be followed 
through generations. Balancer chromosomes developed in 
Drosophila allow specific chromosomes to be transmitted 
intact and followed through multiple generations. 


Balancer chromosomes have three general features: 
(1) one or more inverted chromosomal segments, within 
which meiotic recombinants are not transmitted (see 
Section 13.5 for a review); (2) a recessive allele that results 
in lethality, so an individual cannot be homozygous for 
the balancer chromosome; and (3) a “mark” in the form 
of a dominant mutation conferring a visible nonlethal 
phenotype, so the segregation of the chromosome can be 
followed through generations. An example of a balancer 
chromosome is the CIB chromosome used by Hermann 
Muller to demonstrate that X-rays induce mutations (see 
Experimental Insight 13.1). 

Balancer chromosomes are available for all of the 
Drosophila chromosomes and can be used to identify 
mutations on specific chromosomes (Figure 16.3). Male 
flies are fed EMS to induce mutations and then are mated 
with females containing a balancer chromosome. Note 
that while mutations are induced throughout the genome, 
only those on the homolog of the balancer chromosome 
are analyzed. Male F, progeny are selected that inherit a 
mutagenized chromosome from their father and the bal- 
ancer chromosome from their mother. Next, the selected 
males are mated to females of the balancer stock, produc- 
ing F, progeny. The F, generation consists of both males 
and females heterozygous for the induced mutation and 
can be interbred to produce F3 progeny. In the F; genera- 
tion, 25% should be homozygous for the induced mutation 
and will not carry the dominant allele of the balancer 
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chromosome; 50% will be heterozygous for the newly 
induced mutation and also carry the dominant allele; and 
the remaining 25% will die due to homozygosity for the 
balancer chromosome. The homozygous progeny lacking 
the dominant allele from the balancer chromosome can be 
screened for an aberrant phenotype. 

What happens if the new mutation results in lethal- 
ity when it is homozygous? In that case, all surviving F3 
individuals will carry the dominant allele, located on the 
balancer chromosome. When a lethal mutation is identi- 
fied in this way, the mutant allele can be propagated from 
the heterozygous siblings. This mutagenesis strategy was 
used by Eric Wieschaus, Christiane Niisslein-Volhard, and 
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colleagues in a screen to identify Drosophila mutations that 
disrupt pattern formation during embryogenesis. The re- 
search is described in detail in Section 20.2. 


Screening for Conditional Alleles in Haploid Organisms 
The use of haploid organisms in a forward genetic 
screen has the advantage of allowing both recessive 
loss-of-function mutations and dominant mutations to 
be identified directly. With single-celled organisms, a 
population of mitotically active cells can be mutagenized, 
and mutants with an altered phenotype can be selected 
directly in the colonies derived from the mutagenized 
cells. A disadvantage is that mutations disrupting 
essential processes in growth and physiology are often 
lethal, interfering with the propagation of alleles and thus 
complicating genetic screening. Fortunately, it is often 
feasible to design a screen to identify conditional mutant 
alleles of essential genes. In conditional mutants, the 
encoded gene product is either functional or not needed 
under one environmental condition—the permissive 
condition—but is required and either inactive or absent 
under another—the restrictive condition (see Section 4.1). 

With some lethal mutations, the mutant phenotype 
can be rescued by addition of a needed substance to the 
growth medium. For example, histidine auxotrophic mu- 
tants can grow only when histidine is present in the growth 
medium. In a screen for conditional mutants of this type, 
the mutagenized population is initially grown under per- 
missive conditions—in this case, in a medium containing 
histidine—so that both mutant and wild type will grow. 
This mutagenized population is then replica plated, and 
the population is screened for phenotypic defects (e.g., le- 
thality) when grown under the restrictive condition (e.g., a 
lack of histidine). Such genetic screens were performed by 
Beadle and Tatum to identify auxotrophs in Neurospora in 
the research that established biochemical genetics and pro- 
duced the one gene—one enzyme theory (see Section 4.3). 

Some kinds of mutants can be rescued not by supply- 
ing a certain substance to the medium but by altering other 
kinds of environmental conditions instead. In temperature- 
sensitive mutants, the stability of the polypeptide product 
of a mutant allele differs with temperature (see Section 
4.1), often as a result of a missense mutation. 

This type of conditional lethal allele in the yeasts 
S. cerevisiae and Schizosaccharomyces pombe led to a 
molecular genetic understanding of the cell cycle, a bio- 
logical process shared by all eukaryotes. Mutagenized 
yeast were grown at a permissive temperature to allow 
propagation, and then the mutant lines were exposed to 
a restrictive temperature, causing an arrest in growth of 
some of the mutant strains (Figure 16.4a). Surprisingly, in 
some mutant lines, growth was arrested at specific stages 
of the cell cycle, rather than randomly along the continu- 
ous spectrum of growth (the latter would be expected if 
the mutation had disrupted a metabolic pathway). These 
yeast mutants fell into discrete phenotypic categories 
defined by the stage of the cell cycle at which they were 
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arrested. One possible explanation was the existence of 
specific checkpoints in the cell cycle (Figure 16.4b), and, 
indeed, some of the genes identified by these mutations 
were found to regulate the cell’s progression through vari- 
ous stages of the cell cycle (Figure 16.4c). The studies in 
yeast provided the foundation for understanding the role 
of cell cycle regulation in cancer (see Section 3.1). 


Analysis of Mutageneses 


Typically, the initial analysis of mutants obtained by 
mutagenesis will focus on three key questions: (1) Are 
mutant alleles dominant or recessive with respect to the 
wild-type allele? (2) How many different genes have been 


identified in the mutagenesis? (3) How many different 
mutant alleles of each gene have been identified? 


Determining Dominance or Recessiveness The answer 
to the first question provides insight into whether the 
mutant allele likely represents a loss of function or a gain of 
function (see Sections 4.1 and 12.2 for descriptions of these 
categories). Dominance or recessiveness, which is assessed 
during the mutagenesis, is confirmed using the same 
approach Mendel employed. Individuals homozygous for 
the new mutations are crossed with the wild-type strain in 
which the mutagenesis was performed. The phenotype in 
the Fı progeny derived from the cross allows the mutant 
allele to be designated as dominant or recessive. 


Determining the Numbers of Genes Identified The 
answer to the second question—about the number of 
different genes revealed—provides clues to how many 
genes are involved in the biological process of interest. 
The most straightforward method of determining the 
number of genes represented by a new collection of 
mutants that produce similar mutant phenotypes is to 
perform complementation tests between different pairs 
of the mutant lines. If the progeny produced by crossing 
two recessive mutant lines exhibit a mutant phenotype, 
then the two mutations are in the same gene, whereas 
if the progeny exhibit a wild-type phenotype, then the 
two mutations are in different genes (see Section 4.4). 
In practice, we can limit the number of crosses by 
recognizing that complementation is communicative; that 
is, if mutation A is allelic to mutation B, and mutation 
B is allelic to mutation C, then mutations A and C are 
allelic. In some special cases, such as with mutations 
that are dominant or gametophytically lethal (lethal 
in a haploid stage of the life cycle, e.g., in pollen; see 
Section 4.1), complementation experiments cannot easily 
be performed, and other methods to ascertain allelism, 
such as mapping (see Section 5.2), may be employed. 


Determining the Number of Mutant Alleles Identified for 
a Gene The answer to the third question should follow from 
the complementation analysis. Obtaining multiple mutant 
alleles of each gene is useful for two reasons. Comparing 
mutant phenotypes of multiple alleles allows an assessment 
of the range of phenotypic variation that can be obtained 
by mutation of the gene in question (see Section 4.1). The 
recovery of multiple alleles for each gene also provides 
information on the saturation of the genetic screen; in 
other words, it suggests what percentage of the genes that 
could be identified have in fact been identified. When a 
mutagenesis experiment is shown to have produced multiple 
independent mutations in each gene identified, most genes 
in the process of interest have likely been mutated. 

Genetic Analysis 16.1 challenges you to design a screen 
that identifies genes involved in a particular biological 
process. 


GENETIC ANALYSIS 


PROBLEM Inall eukaryotic organisms, proteins to be secreted from the cell or embedded in the plasma 
membrane are translated at the endoplasmic reticulum and travel via the Golgi apparatus to reach 
the plasma membrane. Outline a genetic screen for identifying genes in- 
volved in protein secretion. 


BREAK IT DOWN: In planning a 
mutagenesis, what type of organism and 
mutagen are appropriate? 


Solution Strategies Solution Steps 


translational processing steps can be 


BREAK IT DOWN: These post- 
reviewed in Section 9.6 (p. 330). 


Evaluate 

1. Identify the topic this problem addresses 1. This problem is about designing a genetic screen to find a certain type 
and the nature of the required answer. of gene. 

2. Identify the critical information given in 2. Information is given about protein secretion in cells, a universal 
the problem. process among eukaryotes. The purpose of the screen is to identify 

mutations in genes that function in that process. 
Deduce 
3. Consider any information given about 3. Since we have not been given any information about the genes 


genes involved in the secretory process. involved in protein secretion, a forward genetic screen would be a 
TIP: Consider experimental approaches that good approach because forward genetic mutageneses do not depend 
do not require prior knowledge of gene function. on prior knowledge about biochemical functions or gene sequences. 

4. Based on the chapter discussion of 4. Since secretory systems in all eukaryotes are similar, they are likely 
forward genetic screens, choose an to be homologous, that is, inherited from a common ancestor. 
appropriate organism. Thus we can choose any eukaryote amenable to genetic analysis. 

A TIP: In which organisms does the aa Saccharomyces cerevisiae would be a good choice because many 
process occur? genetic tools already exist for this model genetic organism. 

5. Based on the chapter discussion of 5. Because complete loss of a functioning secretory system is likely to be 
designing a forward genetic screen and lethal to any organism, we should use a strategy to identify conditional 
on the phenotypic consequence of a loss mutant alleles. Thus we should use a mutagen that induces point 
of protein secretion, pick a strategy for mutations. 


identifying desirable mutant alleles. PITFALL: Avoid the possibility of mutations 
that are lethal under all growth conditions. 


Solve 


6. Design an approach for a genetic screen 
based on Solution Steps 3-5. 


6. A good design would be one similar to the procedure used to identify 
temperature-sensitive mutant alleles in genes of the cell cycle in 
S. cerevisiae. Mutagenesis of haploid cells could be performed at a 
permissive temperature (e.g., 25-30°C), followed by screening for 
mutant phenotypes at a restrictive temperature (e.g., 39°C). 
7. Describe how you would identify 7. Amethod to monitor secretion is required. One approach would be to 
mutations specifically affecting secretion. select a protein known to be secreted into the growth media of wild-type 
S. cerevisiae and look for mutants that do not secrete that protein (i.e., the 
protein is not detected in the medium in which they are growing). 


For more practice, see Problems 12, 13, 14, 18, 19,and21. Visit the Study Area to access study tools. 


MasteringGenetics™ 


Identifying Interacting and Redundant Genes 
Using Modifier Screens 


Generally, mutant phenotypes reflect the response of the 
organism to a loss or change of a particular gene product. 
However, individual genes do not act in isolation. The 
activity of other genes may modify, by either enhancing or 
suppressing, the phenotypic defects caused by the loss of a 
gene product. One approach to discovering genetic inter- 
actions is to carry out a genetic modifier screen to see if 
mutations in a second gene can enhance or suppress the 
phenotype of the first mutation. An enhancer screen is a 
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modifier screen in which mutations in a second site en- 
hance the phenotype of the initial mutant. A suppressor 
screen is a modifier screen designed to identify second- 
site mutations that suppress the phenotype of the initial 
genotype. Note that both types of screens can be per- 
formed simultaneously. Enhancer—suppressor screening 
strategies are almost limitless in number and sophistica- 
tion and have the potential to identify genes that function 
in interacting genetic pathways. 

Modifier screens can identify double mutants that 
display an unexpected phenotype, one that is not simply 
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the combination of the phenotypes of the two single mu- 
tants. In perhaps the most dramatic form of enhance- 
ment, termed synthetic lethality, the two single mutants 
are viable but the double mutant is inviable. 

Synthetic lethality, or synthetic enhancement, was 
first noted by Drosophila geneticists who observed that 
some pairwise combinations of mutant alleles were invi- 
able. For example, when Alfred Sturtevant crossed prune 
(pn) mutant females (pn is on the X chromosome) with 
males from a stock of separate origin called S/E-S, he 
noted that the progeny consisted solely of pn* females 
and no viable males (Figure 16.5a). Sturtevant determined 
that the S/E-S males carried an autosomal dominant 
mutation, which he called Prune-killer (K-pn), that in 
combination with pn results in lethality, but he noted that 
flies homozygous for K-pn mutation alone did not have an 
aberrant phenotype. In his cross, all male progeny inher- 
ited a pn allele from their mother and a K-pn allele from 
their father, and therefore these progeny died. In contrast, 
the female progeny were viable, since despite inherit- 
ing a K-pn allele from their father, they also inherited a 
pn allele from their father. In this example, both pn and 
K-pn mutants are viable, but the pn, K-pn double mutant 
results in lethality. 

Figure 16.5b shows two possible mechanisms to ex- 
plain synthetic lethality. In one mechanism, the two genes 
in question act in parallel complementary pathways. In 
this scenario, mutations resulting in the loss of either 
pathway can be compensated for by the activity of the 
remaining pathway. However, when both pathways are 
disrupted, a dramatic enhancement in mutant phenotype 
is observed. An alternative mechanism is possible when 
both genes are acting in the same pathway: A reduction 
in function of one component of the pathway results in a 
mild phenotype, but when two components are disrupted, 
the pathway no longer functions effectively. Note that in 
the latter scenario, hypomorphic alleles can result in syn- 
thetic enhancement, but null alleles cannot. 

The first scenario, where two genes act in parallel, is 
an example of genetic redundancy, where the loss of the 
function of either gene alone is compensated for by the ac- 
tivity of the other nonmutant gene. Only when both genes 
are mutant would a conspicuous mutant phenotype be 
evident. In such a case, a 15:1 segregation ratio could be ex- 
pected in the F, of a cross between the two recessive single 
mutants (see Section 4.3). In the most obvious case of ge- 
netic redundancy, two genes encode very similar proteins 
that can function interchangeably. In many instances, the 
activities of the two genes do not fully compensate for one 
another, such that single mutations, in either gene alone, 
result in a mild phenotype, while a severe phenotype is seen 
when both genes are mutant. Genetic redundancy caused 
by the presence of duplicate genes can arise through small- 
scale duplications or through whole-genome duplications. 
As we explore in detail in Chapter 18, genome sequences 
of eukaryotes show such duplications to be very common. 
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Figure 16.5 Synthetic enhancement. 


Genetic redundancy can also arise from the compen- 
satory action of genes that have little or no sequence simi- 
larity and encode biochemically different activities. This 
type of genetic redundancy is difficult to predict on the 
basis of the DNA sequences of the genes, but it too can be 
uncovered by enhancer-suppressor screens. Enhancer— 
suppressor screens have been performed on many organ- 
isms, including Drosophila, C. elegans, Arabidopsis, and 
mice (see Section 18.3), and are extremely successful at 
identifying interacting genetic pathways (see Section 20.3). 
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16.2 Genes Identified by Mutant 
Phenotype Are Cloned Using 
Recombinant DNA Technology 


While genes can be identified by genetic screens, deter- 
mination of the specific DNA sequences of the wild-type 
and mutant alleles requires the cloning, or large-scale 
copying, of the gene, using recombinant DNA techniques 
to manipulate DNA molecules in vitro and in vivo. In this 
section, we discuss the theoretical foundations of how 
cloning of specific genes is achieved. 

To appreciate the magnitude of the task of cloning a 
specific gene, consider that the goal is to identify the par- 
ticular gene responsible for the mutant phenotype from 
among the thousands (or tens of thousands, in the cases of 
many eukaryotes) in the organism’s genome, the proverbial 
needle in a haystack. Because both the biology and the ease 
of manipulation vary depending on the organism, different 
approaches have been developed for different species. In 
this section, we describe four of those approaches. 

Although recombinant DNA technology is discussed 
in detail in Chapter 17, we preview here two aspects of 
the technology that are required for explaining how genes 
are cloned. First, gene sequences created in vitro can be 
introduced into the genome of a living organism. Such 
genes are termed transgenes, and the resulting organ- 
ism is a transgenic organism. As this process is similar 
to the transformation of bacteria—that is, the uptake 
of free DNA from outside the cell to inside the cell (see 
Chapter 6)—the creation of a transgenic organism is also 
referred to as transformation. The ease with which this 
process is accomplished varies significantly between or- 
ganisms and thus influences strategies for gene cloning. 

A second key aspect is the creation of libraries, col- 
lections of clones of DNA fragments, derived from the 
total DNA or mRNA isolated from an organism. A library 
is a set of recombinant DNA molecules that collectively 
includes clones of all the relevant DNA sequences of an 
organism. 

Genomic libraries are collections of cloned DNA 
fragments that as a group represent the entire genome 
of an organism, including repetitive and noncoding 
sequences. Genomic libraries usually consist of tens to 
hundreds of thousands of clones, each carried within an 
individual cloning vector—usually a plasmid (see Section 
6.1) or bacteriophage (see Section 6.5) that has been modi- 
fied to accommodate the insertion of exogenous fragments 
of DNA and that can be stably maintained in a host, such as 
E. coli. Genomic libraries are often constructed in a cloning 
vector such as a bacterial artificial chromosome (BAC), 
which can carry large pieces, greater than 100 kb, of ge- 
nomic DNA. The BACs are then propagated in bacteria. A 
collection of many thousands of BAC-containing bacterial 
colonies, each of which harbors a BAC containing a differ- 
ent fragment of the genome, makes up the genomic library. 


In contrast, cDNA libraries are collections of cloned 
DNA fragments that represent all the mRNA produced by 
an organism. In other words, only that portion of the ge- 
nome that is transcribed is represented in a cDNA library. 
The clones of a cDNA library are also placed in cloning 
vectors, such as specially modified plasmids, and intro- 
duced into bacteria so that the complete cDNA library is 
composed of a large number of bacterial colonies, each of 
which harbors a different cDNA clone derived from the 
mRNA population. 

Within a library, clones containing specific DNA 
sequences can be identified through complementary base 
pairing in a manner similar to that described in Research 
Technique 10.2 and in more detail in Chapter 17. With 
awareness of these tools, we can now consider the four 
approaches that are the focus of this section and whose 
purpose is to physically identify specific genes. 


First, genes can be identified by introducing a wild- 
type copy of a gene to complement a recessive mutant 
phenotype. 


A second approach is to use a piece of DNA, such as a 
transposon, with a known sequence to “tag” the gene 
of interest. The tag can then be used to identify flank- 
ing sequences, DNA on either side of the tag, that 
contain the gene. 


A third approach is to map the gene of interest rela- 
tive to known genetic markers (see Chapter 5), then to 
identify DNA clones spanning the locus, and finally to 
search through the DNA for the gene of interest. 


Lastly, advances in DNA sequencing technology have 
made it feasible to obtain genes identified in genetic 
screens by directly comparing the genome sequence 
of the mutant with that of the wild-type strain from 
which it was derived. 


Cloning Genes by Complementation 


The most direct approach to identifying specific genes 
is to detect genetic complementation of a mutant phe- 
notype by an introduced wild-type gene. This approach 
is restricted to cases in which large numbers of trans- 
genic organisms can be generated. Consider the yeast 
temperature-sensitive cell-cycle mutants described in 
Section 16.1. If clones of a yeast cDNA expression library 
are transformed into a yeast cell-cycle mutant, any clones 
that complement the mutant phenotype so that the cells 
grow normally should contain wild-type alleles of the 
mutated gene (Figure 16.6). In a procedure of this type, 
the yeast strain would first be transformed and grown at 
the permissive temperature. The resulting yeast colonies 
would then be transferred to an environment maintained 
at the restrictive temperature. Only the yeast colonies 
receiving a clone encoding a wild-type version of the mu- 
tant gene in question would be able to continue growth 
at the restrictive temperature; in those colonies, the 
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complement the cdc2 mutant will grow at the 
restrictive temperature. 


Figure 16.6 Cloning by complementation. 


mutant phenotype would have been complemented by 
the added gene. 

Complementation experiments can also be used 
to identify similar genes from other species, if there is 
sufficient conservation of protein function. For example, 
research in which a yeast cell-cycle mutant was trans- 
formed using a human cDNA expression library (one 
in which the human cDNA clones were first fused with 
sequences allowing for their transcription and translation 
in yeast) has led to identification of human genes simi- 
lar in function to the mutated yeast genes. The fact that 
both human and plant genes can complement these yeast 
mutants demonstrates the universality of the cell-cycle 
machinery and indicates that such proteins were present 
in the common ancestor of eukaryotes. 


Using Transposons to Clone Genes 


Transposons can be used as an identifying tag to clone 
specific genes, a technique called transposon tagging. 
Recall that transposons are mobile genetic elements that 


can integrate into the genome with little if any target- 
sequence specificity (see Chapter 13). If the sequence of 
a transposon is known, the transposon sequence can be 
used to probe a genomic library constructed from DNA of 
a strain in which the same transposon has been inserted 
into a target gene. Sequences adjacent to the transposon 
should belong to the target gene. 

The fact that the sequence of a transposon must first 
be known if the transposon is to be used as a probe is a 
chicken-and-egg problem similar to others we have en- 
countered with probes. A solution in this particular case 
is to “trap” the transposon in a gene whose sequence is 
already known. Recall that allele instability is characteris- 
tic of transposon insertion (see Chapter 13). If researchers 
first identify unstable mutant alleles of a cloned gene— 
alleles likely to contain a transposon—they can then use 
a probe for that cloned gene to isolate the transposon 
sequence. 

For transposon tagging to succeed in practice, the 
biology of transposons must be considered. Since trans- 
posons often occur in high copy numbers in the genome, 
techniques are necessary to distinguish the copy of the 
transposon in the target gene from all other copies of 
the transposon in the genome. The ideal situation is to 
begin with a genotype harboring only a single copy or a 
low copy number of a transposon of known sequence and 
then mobilize the transposon to create a mutant collec- 
tion, which is then screened for phenotypes. 

Also to be considered is that, since transposons are 
mobile, a transposon that has been inserted into the target 
gene may jump out again. To circumvent this problem, 
the transposon that is used as a tag is often separated 
into two components—the transposase and the inverted 
repeats whose sequence the transposase recognizes (see 
Section 13.5). The inverted repeats form the functional 
part of a nonautonomous element that cannot move on 
its own but can be mobilized if transposase is supplied in 
trans. Ideally, the transposase activity is produced from a 
mutant transposon that is not capable of moving because 
it lacks the inverted repeats. The new insertions of the 
nonautonomous transposon can be stabilized by removal 
of the transposase source through outcrossing. 

A general protocol for using a transposon to tag a 
gene in a diploid eukaryote is shown in Figure 16.7. Two 
lines are initially crossed, one of which is homozygous for 
a stable mutant allele of the target gene and the second of 
which carries an active transposon system and is homozy- 
gous for the wild-type allele of the target gene. The F; of 
this cross is heterozygous for the mutant allele of the target 
gene in a genomic background with an active transposon. 
If the transposon moves into the gene of interest, thus cre- 
ating a second mutant allele, the F; individual will display 
the homozygous recessive mutant phenotype. Screening of 
a large number of F; individuals is usually required to find 
any with new transposon-induced mutations in the target 
gene, since transposon movement into a specific gene is 
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Figure 16.7 Use of transposons for tagging genes. 


a rare event. Once a transposon-induced allele is identi- 
fied, the causative transposon can be cloned and the DNA 
flanking the causative transposon should represent the 
gene of interest. 

Transposon tagging is limited to organisms that 
harbor active transposons or into which an active 
transposon can be introduced from another species. 
However, this limitation is not often a problem since, 
for example, the maize Ac/Ds transposon system (see 
Section 13.5), when introduced on a transgene, is active 
in other plant species, such as Arabidopsis and tomato, 
and is even active in zebrafish. We will return to the use 
of transposons to mutate genes when we discuss reverse 
genetics. 


to gene of interest. 


Positional Cloning 


The approaches to cloning genes we have discussed thus 
far are not applicable to all organisms, as they rely on 
either a high efficiency of transformation (available in 
many bacteria and some fungi) or on active transposons. 
When these tools are not available, how do biologists 
find the DNA sequence for a gene that is known only by 
its mutant phenotype? They do it by combining a genetic 
map made from recombination frequencies (Chapter 5) 
with a physical map of the genome based on DNA clones, 
or, when available, the genome sequence in order to find 
the DNA sequence at a specific map position. Figure 16.8 
provides an overview of the relationships between genetic 
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Figure 16.8 Correlating genetic maps and physical maps of Arabidopsis to locate DNA 


sequences of genes. 


maps based on the segregation of genetic loci, physical 
maps based on sets of overlapping genomic clones, genes, 
and the DNA sequence of the genome. If two molecular 
(DNA) markers are identified that flank the gene of inter- 
est, the gene must reside in the intervening DNA. The 
DNA for the gene can ultimately be identified by isolating 
a set of DNA clones that collectively span the region be- 
tween the flanking markers. This approach is referred to 
as positional cloning, or chromosome walking, since it 
consists of “walking” along the chromosome in sequential 
steps, from one flanking marker toward the other, by join- 
ing overlapping DNA clones (as described below). 
Positional cloning is done in three steps. The first 
step is to construct a genetic map (map @in Figure 16.8) 
that shows the location of the gene of interest relative 
to mapped DNA markers (map @in Figure 16.8 consists 


of markers). The second step is to identify DNA clones 
(map © of Figure 16.8) that span the markers flanking 
the gene of interest. The third step is to identify the gene 
of interest (see examples in map @of Figure 16.8) within 
the spanning DNA and determine its nucleotide sequence 
(map @ of Figure 16.8). 


Step 1: Constructing a Genetic Map Using DNA 
Markers In 1980, a landmark paper by David Botstein 
and colleagues proposed an idea for using DNA markers 
as the basis of a genetic map for the purpose of performing 
positional cloning of human genes. In the decades that 
followed, the cloning of many human “disease genes” 
was accomplished using this protocol. Even now that the 
human genome has been sequenced, a similar mapping 
protocol continues to be used for gene identification in 
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humans, and the general approach to positional cloning, 
as described here, can be applied to any organism. 

The key to positional cloning is to identify molecu- 
lar markers flanking the gene you wish to clone. The 
flanking DNA markers define the two ends of the DNA 
sequence within which the gene of interest is located. 
Any DNA sequence that varies between individuals can 
potentially be a DNA marker, including single-nucleotide 
polymorphisms (SNPs), restriction fragment length poly- 
morphisms (RFLPs), small insertions or deletions, and var- 
ious repeating DNA sequence variants (see Chapter 10). 
Once a collection of polymorphic DNA markers has been 
identified, detecting the segregation of these markers in a 
“mapping population” will allow placement of each marker 
at a particular location in the genome. Map construction 
with molecular markers follows the same procedure as 
with phenotypic markers (see Chapter 5); different DNA 
markers that co-segregate are physically linked, with a 
recombination frequency proportional to the distance in 
map units between them. 

To examine how mapping works, let’s take an ex- 
ample from Arabidopsis in which a gene is mapped in an 
F, population (Figure 16.9). The first step in the construc- 
tion of a genetic map is to identify two strains that differ 
in DNA sequence; in this case, the strains were Landsberg 
(L) and Columbia (C). Each strain is highly homozygous 
due to inbreeding, yet they differ from each other at poly- 
morphic loci throughout the genome. 

The mapping population is generated by crossing the 
two homozygous lines to produce an F; generation that is 
heterozygous at all loci that differ between the two inbred 
lines. These F, individuals are then interbred or allowed 
to self-fertilize. At each locus in the genome, individuals 
in the resulting F, population can be either homozygous 
for alleles of one or the other of the original inbred lines, 
or they can be heterozygous. Alleles for the gene of inter- 
est, in this case AP2, are also segregating in the F popula- 
tion, since one parent (L) was homozygous for a recessive 
ap2 mutant allele while the other parent (C) was homozy- 
gous for the wild-type AP2 allele. 

The genotypes of each F, individual are determined 
for the DNA markers and the gene of interest. DNA 
markers that co-segregate with the mutation are linked 
to the gene of interest, and their distances from the gene 
is proportional to the recombination frequency; unlinked 
DNA markers should segregate independently of the mu- 
tation. In most cases, the alleles of DNA markers are 
codominant, so that examin ation of DNA allows direct 
determination of genotype. In contrast, only F> individu- 
als homozygous for a recessive mutation in the gene of 
interest can be accurately genotyped, and the genotype 
of phenotypically wild-type Fy individuals has to be de- 
termined in the F; or by a test cross. While this example 
comes from a model genetic system, genetic mapping in 
humans with molecular markers follows a similar proto- 
col (see Section 5.5). 


Because the number of DNA sequence differences 
between two strains is likely to be greater than the total 
number of genes in the organism, genetic maps based on 
DNA markers are often dense enough for flanking mark- 
ers closely linked to the gene of interest to be identified. 
Once DNA markers are found that flank the gene of in- 
terest, the next step of identifying the DNA between the 
markers can proceed. 


Step 2: Constructing Contiguous Sequences of DNA 
Before the advent of genome sequencing projects, 
researchers were forced to assemble the DNA spanning 
two markers by constructing contiguous sequences 
(contigs) from sets of overlapping genomic clones 
(Figure 16.10). The DNA markers flanking the gene 
of interest @ can be used as probes on a genomic 
library to identify genomic clones that contain the DNA 
markers @. The ends of these genomic clones can be 
used to probe the genomic library again to identify 
clones that overlap the initial clones ©. Reiteration of 
this process will identify additional overlapping genomic 
clones (contigs) extending in both directions from the 
initial two flanking DNA markers. Extension in one 
direction reveals sequences closer to the gene of interest, 
and extension in the other direction reveals sequences 
farther from the gene of interest. 

How is the directionality of the chromosome “walk” 
determined? Genetic mapping of polymorphic DNA se- 
quences in the newly isolated genomic clones can resolve 
the directionality of the overlapping genomic clones @. If 
the end of the genomic clone maps closer to the gene of 
interest than to the initial DNA marker, the directional- 
ity is toward the target gene. Conversely, if the end of the 
genomic clone maps farther from the gene of interest than 
from the initial DNA marker, the directionality is away 
from the target gene. Once directionality is ascertained, 
reiterative probing of genomic libraries in the direction 
approaching the gene, using the sequences at the ends of 
newly isolated genomic clones, allows the construction of 
ever-larger contigs that should eventually span the entire 
DNA sequence between the two flanking DNA markers @. 

The availability of genome sequences for many model 
genetic organisms has simplified positional cloning. With 
these species, the construction of a contig is not required, 
so once the gene of interest has been mapped, the re- 
searcher can proceed directly to the identification of 
candidate genes in the genome sequence spanning the 
mapped interval, as described below. 


Step 3: From Contig to Gene A contig spanning two 
markers that flank a gene of interest must, by definition, 
include the target gene, but how do we find the gene 
of interest among the other genes in the contig @? The 
answer depends to a large degree on the organism under 
study. If the genome sequence is known, the number 
and identity of candidate genes—that is, sequences 
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Figure 16.9 Mapping of genes using molecular markers. 
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Figure 16.10 Positional cloning. 


that could encode the gene of interest—are essentially 
known. In contrast, if the genome sequence is not known, 
experimental approaches are required to identify candidate 
genes within the spanning DNA. 

In organisms amenable to transformation, the “gold 
standard” of gene identification for positional cloning is to 
complement the mutant phenotype by introducing a copy 
of the wild-type allele into the mutant background. This ap- 
proach is similar to cloning by complementation described 
earlier, except the number of candidate genes is reduced 
from the entire set of genes in the genome to only those genes 
that map between the flanking markers. Transformation 
experiments are routine in many model genetic organisms 
and are described in more detail in Chapter 17. 


In organisms not amenable to transformation 
(e.g., humans), other approaches can be used to identify 
and characterize candidate genes. First, direct sequenc- 
ing of candidate genes and comparison of the sequences 
in wild-type and mutant individuals can reveal the gene 
of interest. Missense or nonsense mutations might be 
expected in each of the mutant alleles relative to the wild- 
type allele. However, because mutations outside of the 
coding region may be responsible for the altered gene ex- 
pression, noncoding sequences may have to be surveyed 
as well. Note that for non-inbred species, if there is only a 
single mutant allele to examine, it may be difficult to tell 
whether differences in the DNA sequences of candidate 
genes are the cause of the mutant phenotype or simply 
polymorphisms existing in the population. 

A second approach to identifying the target gene 
where transformation is not possible is to use the nature 
of the phenotypic defect conferred by the mutant allele 
as a source of clues to probable gene expression patterns. 
Candidate genes can then be assayed for those expression 
patterns in specific cells and tissues. It may also be pos- 
sible to detect changes in RNA expression patterns—for 
example, mutations resulting in altered patterns of splic- 
ing or those resulting in mRNA that is less stable than 
the wild-type mRNA. Genes can also be surveyed based 
on the type of protein they are thought to encode. If it is 
possible to predict the biochemical function of the target 
gene, some candidate genes may have features that make 
them appear more likely than others to be able to perform 
that function. However, in many cases this knowledge will 
be lacking. 

Positional cloning strategies have been applied to 
various model genetic systems. In the 1980s and 1990s, 
many genes in Drosophila, C. elegans, and Arabidopsis 
were identified by positional cloning protocols, long be- 
fore their genomes were sequenced. Positional cloning 
has been particularly successful in identifying genes as- 
sociated with human diseases, despite the infeasibility 
of performing controlled crosses and complementation 
experiments in humans. 


Positional Cloning in Humans: 
The Huntington Disease Gene 


Huntington disease, an inevitably fatal, late-onset neuro- 
degenerative disorder, is named for George Huntington, 
the physician who published the classic description of 
the disease and its inheritance in 1872. His description 
specified the symptoms of movement disorder, personal- 
ity change, and cognitive decline and, notably, outlined 
the autosomal dominant pattern of inheritance, a feature 
that went unappreciated until after the rediscovery of 
Mendel’s work in 1900. Huntington recognized the pat- 
tern of inheritance thanks to the combined experience of 
his father and grandfather, both also physicians, who had 
the unique opportunity of observing several generations 
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of the disease in a local family. He did not encounter the 
juvenile onset form of the disease, however, which pres- 
ents additional symptoms, such as rigidity and seizures. In 
a form of inheritance termed anticipation, juvenile-onset 
Huntington is inherited through a paternal allele. 


Mapping of the HD Gene Researchers have compiled 
extensive pedigrees depicting the transmission of 
Huntington disease in a large family in Venezuela. The 
pedigrees span 10 generations and include nearly 20,000 
individuals, many of whom are living. In the early 1980s, 
James Gusella and Susan Wexler and colleagues, studying 
this Venezuelan kindred as well as a large Ohio family, 
mapped the HD gene to the short arm of chromosome 4 
(see the Chapter 5 Case Study for a similar mapping 
experiment). Additional polymorphic markers linked to 
dominant mutant alleles of the HD gene further confined 
it to a region of 2.2 Mb on chromosome 4. Mutant HD 
alleles were known from a large number of unrelated 
families from diverse genetic backgrounds, suggesting 
that dominant mutant HD alleles have arisen multiple 
times independently. Mapping data from 75 families 
identified a haplotype shared by about one-third of the 
families and suggested that the HD gene was likely to 
reside within 500 kb of the shared haplotype (step @in 
Figure 16.11; see Chapter 5 for a discussion of haplotypes). 


Candidate Gene Identification Before 2001, the year 
a draft of the human genome sequence was published, 
identification of genes in large stretches of human 
genomic sequence was an arduous task. To clone the 
HD gene in the early 1990s required construction of a 
contig of genomic clones spanning the HD locus, using 
the techniques described in Figure 16.10 for isolating 
overlapping genomic clones. To identify transcribed 
sequences within the 500-kb genomic region, a novel 
exon-trapping approach was used. Fragments of the 
genomic DNA were cloned into a vector, where they were 
flanked by two exons contained in the vector sequence. 
When assayed in human cells in culture, if the genomic 
DNA did not contain an exon, the two vector exons 
would become spliced together in post-transcriptional 
processing, generating a transcript of a defined size. 
However, if the cloned genomic DNA contained an exon, 
it would be spliced to the two flanking exons, creating 
a transcript of a larger size. This technique revealed the 
presence of four transcribed genes in the region. 

Two approaches were undertaken to evaluate the 
candidate genes. First, the mRNA expression patterns of 
the genes were analyzed. However, no difference in ex- 
pression patterns or levels for any of the four genes was 
detected between normal and HD individuals. Second, the 
candidate genes were examined for DNA polymorphisms 
(steps @ and © in Figure 16.11). One of the candidate 
genes was polymorphic between individuals. A striking 
difference in the lengths of a trinucleotide repeat sequence 


in exon 1 of this gene was observed; normal individuals 
had 17 to 34 copies of a CAG repeat, and HD individuals 
had from 42 to more than 66 copies. The same correla- 
tion was seen in all 75 families, strongly suggesting that 
this was the HD gene. As further supporting evidence, the 
length of the repeats in the HD individuals also correlated 
with the age of onset of disease symptoms. 

The HD gene spans 210 kb, encoding an mRNA of 
more than 10 kb, and has an open reading frame of 9432 
bases encoding a protein of 3144 amino acids. In this case, 
there is little in the protein sequence that provides a clue 
to function (and possible treatment). However, knowledge 
of the DNA sequence has provided a way of testing for 
the presence of the disease allele in families in which it is 
segregating. This information can be used in prenatal di- 
agnostics to eliminate the allele from the next generation if 
therapeutic abortion is an option. While this test may seem 
to be a blessing, it introduces many ethical quandaries. 
Should one test a child for an adult-onset disease where 
there is no prospect for treatment or a cure at present? 
Might testing in a young adult inadvertently provide in- 
formation about another individual, such as a parent, who 
does not wish to know his or her own genetic status? 

Analysis of the polymorphic CAG repeat has also pro- 
vided insight into the phenomenon of anticipation. The 
CAG alleles whose length approaches the high end of the 
normal range [(CAG)97_35] are unstable during transmis- 
sion and change size from one generation to the next. 
Instability occurs in both maternal and paternal inheri- 
tance, but large expansions have been noted only during 
male transmission; this explains why juvenile patients 
almost always inherit the mutant allele from their father. 
While the molecular basis of this gender bias has become 
apparent, the mechanistic basis is still unknown. 


Genome Sequencing to Determine 
Gene Identification 


The most direct way to identify the molecular nature 
of mutations might seem to be to compare the genome 
sequence of the mutant line with that of the wild-type 
strain from which it was derived. Such an approach would 
obviate the need for the often time-consuming and ex- 
pensive steps involved in positional cloning. In theory, 
comparison of wild-type and mutant sequences should 
be straightforward, but there are both technical and ex- 
perimental obstacles. First, in organisms like humans, it is 
difficult to distinguish between causative mutations and 
widespread polymorphisms. Second, even in inbred labo- 
ratory animals, typical mutagenesis protocols produce up 
to several hundred new mutations in each mutagenized 
gamete, introducing the need to backcross new mutant 
lines with their wild-type parental strain, as described ear- 
lier in this chapter, in order to isolate the causative muta- 
tion from the background of other mutations induced 
during the mutagenesis. 
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(1) The HD gene was mapped to a 500-kb region of chromosome 4, in which four transcribed genes were detected. 
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Figure 16.11 Locating the Huntington disease gene. 
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These obstacles can be overcome by simultaneously 
examining the genomes of many mutant organisms after 
backcrossing. The details of how genome sequencing is ac- 
complished are described in Chapter 18, but a conceptual 
outline of its application to identify a gene originally de- 
fined by a mutant phenotype is presented in Figure 16.12. 
First, the newly identified mutant line is backcrossed 
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Figure 16.12 Genomics approach to gene identification 
following mutagenesis. 


with the wild-type strain from which it was derived. The 
resulting F, individuals are interbred to produce an F, gen- 
eration from which homozygous mutants can be selected. 
DNA is isolated from a number of homozygous mutants in 
the F, and is then pooled and sequenced in amounts suffi- 
cient to ensure that, on average, every nucleotide in the ge- 
nome of each individual will be sequenced. The idea is that 
the causative mutation will be homozygous in all F indi- 
viduals selected, while other mutations will not. Mutations 
that are not linked to the causative mutation will segregate 
in a Mendelian fashion in the Fy, and this situation will 
be reflected in the genome sequences. Mutations that are 
linked will segregate according to how closely they are 
linked to the causative mutation. 

The concept behind using a large number of F, prog- 
eny is that, while in a single F, individual the probability 
of recombination between the causative mutation and 
another, closely linked mutation will be low, in a popula- 
tion some level of recombination will occur between the 
causative mutation and most unlinked mutations. For ex- 
ample, if 50 homozygous mutant F, individuals are exam- 
ined, 100 meiotic events are being assayed (since meiosis 
will have occurred to produce each of the gametes in 
the F, parents), providing a resolution of approximately 
1 cM. Knowing the genome sizes of the model genetics 
organisms and their genetic map length (see back end- 
sheets), a researcher can approximate the likelihood of 
identifying only a small number of candidate mutations. 
The process of confirming the gene identification then 
follows that described earlier for positional cloning. Due 
to inexpensive DNA-sequencing technologies, this ap- 
proach for going from mutant phenotype to gene iden- 
tification is becoming commonplace in Drosophila, C. 
elegans, and Arabidopsis. 


16.3 Reverse Genetics Investigates 
Gene Action by Progressing from 
Gene Identification to Phenotype 


Forward genetics was for a long time the primary—and 
for much of the last century, the only—approach to 
uncovering gene function. Now, however, the develop- 
ment of molecular methods for gene identification and 
advances in sequencing technologies are making reverse 
genetics approaches increasingly valuable and common. 
The reasons for this shift in emphasis are twofold. 
First, the enormous amount of genomic sequence avail- 
able has increased by orders of magnitude the number of 
known gene sequences, and only a fraction of them have 
been assigned a function by forward genetics. For ex- 
ample, when the E. coli genome was fully sequenced, 4288 
protein-coding genes were identified, only 1853 of which 
had been previously identified through forward genetic 
screens. Second, genomic sequencing and reverse genetic 
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screens have uncovered a degree of gene duplication not 
previously suspected. Gene duplications often result in 
genetic redundancy. In forward genetic screens, such 
duplicated genes would not be identified, since mutation 
of only one of the genes would not usually result in a con- 
spicuous mutant phenotype. However, reverse genetics 
approaches, where the functions of both duplicates can 
be disrupted in an individual organism, are particularly 
suited in these situations to provide evidence of gene 
function. 

Reverse genetics begins with the creation of a mu- 
tant allele for a gene identified only by its sequence (see 
Figure 16.1). The selection of mutational tools is largely 
dependent on the biology of the experimental organism. 
In organisms in which homologous recombination readily 
occurs, targeted sequence changes, such as deletions, are 
the method of choice. In organisms amenable to transfor- 
mation and in which homologous recombination occurs 
at a reasonable frequency, the ideal approach is to pre- 
cisely delete the gene of interest. This approach works in 
many bacteria and fungi and has also been used in mice. In 
Chapter 17 we discuss the details of how gene deletion by 
homologous recombination is accomplished. In organisms 
amenable to transformation but in which homologous re- 
combination is rare, two approaches are widely used. The 
first is to generate a large collection of random mutations 
and then screen them for mutations in the gene of interest 
using PCR-based techniques (see Chapter 7 for review of 
PCR). The second approach is to harness a gene-silencing 
phenomenon known as RNA interference (RNAi), which 
is induced by double-stranded RNA molecules. In species 
not amenable to large-scale transformation experiments, 
nontransgenic methods of mutagenesis can be used. These 
basic techniques for reverse genetics are described in the 
rest of this section. 


Use of Insertion Mutants in Reverse Genetics 


Conceptually, the simplest way to construct a loss-of- 
function allele would be to delete the gene of interest from 
the genome. The deletion of a specific sequence from the 
genome requires techniques, such as homologous recom- 
bination, that precisely manipulate the genomes of living 
organisms. As we will discuss further in Chapter 17, these 
techniques are very efficient in many microorganisms, 
such as bacteria, archaea, and some simple eukaryotes, 
but they are much less efficient in more complex eukary- 
otes like plants and animals. Thus, the approaches used 
in reverse genetics differ between organisms (Table 16.2). 

Reverse genetics approaches for most of the com- 
monly used model genetic organisms utilize knockout 
libraries, collections of mutants in which most or all 
genes have been mutated by inactivating (or “knock- 
ing out”) their expression. Most knockout mutants are 
produced by the insertion of exogenous pieces of DNA 
into the genome to generate loss-of-function alleles; thus, 
most alleles in the libraries are null alleles. Saccharomyces 


Table 16.2 


Reverse Genetics Approaches in Model 


Genetic Organisms 


Species Reverse Genetics Tools 


Escherichia coli Knockouts by homologous 
recombination 


Saccharomyces cerevisiae | Knockouts by homologous 


recombination 


Arabidopsis thaliana Random T-DNA and transposon 


insertions; TILLING; RNAi 


Drosophila melanogaster Random P element insertion 


lines; RNAi 
Caenorhabditis elegans RNAi loss-of-function alleles 
Mus musculus Knockouts by homologous 


recombination; RNAi 


cerevisiae and E. coli geneticists have, for example, sys- 
tematically generated loss-of-function alleles of all known 
S. cerevisiae and E. coli genes by homologous recombina- 
tion. In these knockout library collections, each strain has 
a single mutation in a different gene. See Chapter 17 for 
details on how this is accomplished. 

In many model genetic organisms, it is not technically 
simple or economically feasible to systematically gener- 
ate loss-of-function mutants for all genes. However, if 
an organism is easy to transform, populations of random 
mutants can be generated by transposon insertions or, 
in the case of plants, T-DNA insertions (see Chapter 17 
for details). These populations can then be screened for 
mutations in specific genes, using PCR-based techniques 
with a primer that is specific to the gene of interest and 
a primer that is specific to the insertional mutagen used 
(Figure 16.13). 

For model genetic systems such as Drosophila, where 
P elements have been used as an insertional mutagen, large 
populations of mutants generated by insertions have been 
characterized to such an extent that mutations in specific 
genes can be ordered directly from a stock center. Similar 
knockout libraries based on T-DNA and transposon inser- 
tions are available for Arabidopsis. Such knockout libraries 
are an invaluable resource for large-scale reverse genetics 
experiments that aim to elucidate the function of every 
gene in the model genetic organism (see Chapter 18). An 
example of an application of reverse genetics to determine 
the function of closely related genes in Arabidopsis is 
described in the Case Study at the end of this chapter. 


RNA Interference in Gene Activity 


In the late 1980s, researchers introduced a chalcone 
synthase transgene into Petunia in an effort to increase the 
amount of floral pigment. To their surprise, some trans- 
genic lines exhibited complete loss of pigment production 
(Chapter 15). Not only was the chalcone synthase transgene 
not expressed properly in these lines, but the endogenous 
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Figure 16.13 Reverse genetics using insertional mutagenesis. 


chalcone synthase gene also was silenced, a phenomenon 
they termed co-suppression. A similar phenomenon was 
subsequently observed in both fungal and animal systems. 
This method of silencing genes was initially called quelling 
in Neurospora and RNA interference (RNAi) in animals. 
The phenomenon is now universally known as RNAi. In 
the 1990s, Andrew Fire and Craig Mello, who won the 2006 
Nobel Prize in Physiology or Medicine for their work on 
RNA;, used a genetics approach to dissect and elucidate the 
biochemical mechanism for RNAi in C. elegans. 

Double-stranded RNA (dsRNA) can act as a trigger 
for the degradation not only of the double-stranded RNA 
itself but also of any RNA molecules that are comple- 
mentary to the double-stranded RNA (see Chapter 15). 
A primary role of this gene-silencing system is to silence 
repetitive DNA. Transcription from several different cop- 
ies of repetitive elements often generates double-stranded 
RNA molecules, since collectively both strands of the 
repetitive DNA are often transcribed. In addition, RNAi 
protects cells against double-stranded RNA viruses. Thus, 
dsRNA-mediated gene silencing acts as a genomic im- 
mune system to silence both repetitive DNA sequences 
and invading nucleic acids. 

To take advantage of endogenous RNAi activity as a 
way of silencing genes, scientists utilize double-stranded 
RNA that is complementary in sequence to the tar- 
get gene (Figure 16.14). The mRNA of the target gene 


WUE 


Isolate DNA 


553 


(plants) 


100,000 


| 


g2+t2 


PCR with primers g1, g2, t1, t2 
Wild type 


(no insertion in gene) Insertion mutant 


gl+g2 gl+tl g2+t2 gl+g2 gl+tl 


D If a gene has an insertion, specific 
combinations of g and t primers 
(in this case g1 +T1 and g2 + t2) 
will yield a product. In addition, 
the g1 + g2 primers should yield 
a larger product as compared to 
wild type. 


€) If a gene does not have 
insertion (wild type), 
only the combination 
of primers g1 + g2 
result in a product. 


will then be degraded through the action of Dicer and 
Argonaute enzymes (described in Chapter 15), causing 
a loss-of-function phenotype of the target gene. The ef- 
ficiency of silencing can approach that of a null allele, 
although often the phenotypes induced represent a range 
of partial loss-of-function phenotypes. 

The double-stranded RNA can be introduced directly 
into cells or organisms by injection of double-stranded 
RNA or indirectly by infection with a double-stranded 
RNA virus. Alternatively, a transgene can be designed 
that results in the production of double-stranded RNA, a 
method that has the added advantage of being heritable. In 
animals, transient introduction of double-stranded RNA 
into cell cultures has been successful. One of the methods 
for introducing double-stranded RNA into C. elegans is 
surprisingly simple. Caenorhabditis elegans normally eats 
E. coli as food, and, remarkably enough, when C. elegans 
is fed E. coli that is producing double-stranded RNA, the 
double-stranded RNA will be taken up into C. elegans and 
will silence genes in many organs of the C. elegans body. 
While in this case the RNAi phenotype is not indefinitely 
heritable, the phenotypic effects can be seen in several 
subsequent generations produced by self-fertilization of 
the worm that was fed the E. coli. 

The advantages of the RNAi approach to reverse ge- 
netics include the ease and rapidity of applying the method. 
It allows large-scale reverse genetic screens to be conducted 
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Figure 16.14 Reverse genetics using RNAi. 


in cell cultures and whole organisms without the laborious 
preparatory task of creating mutagenized populations. In 
addition, transient RNAi-mediated gene silencing offers an 
alternative means of applying reverse genetics in species for 
which stable transformation protocols do not exist. 

In a related approach, synthetic micro RNAs have 
been created to target the degradation of specific mRNAs. 
Like RNAi-mediated gene silencing, synthetic micro- 
RNA-mediated gene silencing takes advantage of endog- 
enous gene-silencing machinery (see Chapter 15). The 
synthetic microRNAs are designed according to princi- 
ples derived from known microRNAs but are customized 
to direct the translational repression or mRNA cleavage 
of the gene of interest. 


Reverse Genetics by TILLING 


Reverse genetics can also be performed on species that can- 
not be transformed easily, as long as the species is amenable 
to standard genetic analyses. One approach to reverse genet- 
ics that can be applied to any genome is targeted induced 
local lesions in genomes (TILLING). In a TILLING pro- 
tocol, a population of organisms of an inbred strain is ran- 
domly mutagenized throughout the genome (Figure 16.15). 
Enough independent lines are produced to bring the level of 
mutagenesis to near saturation, at which, ideally, each gene 
is represented by multiple mutant alleles in the mutagenized 
population. Often, the mutagen employed in the develop- 
ment of the mutagenized lines is a chemical such as EMS 
(Table 16.1). DNA from the mutagenized lines is screened 


systematically using PCR-based methods to search for mu- 
tations in a particular gene of interest. 

For each individual of the mutagenized population, 
both progeny and DNA are collected. The generation de- 
rived from the mutagenized population is often referred to 
as the M4 generation (Figure 16.15a). DNA is isolated from 
M; individuals or from M; families of organisms. Any mu- 
tation induced in the mutagenesis will be either heterozy- 
gous (if the DNA was derived from an M; individual) or 
segregating (if the DNA was derived from an M) family). 
A region of the target gene is chosen for PCR-based am- 
plification. The PCR products generated in this analysis 
are expected to contain both the wild-type sequence and 
mutant sequence. Those that consist solely of the wild- 
type allele can be distinguished from those consisting of a 
mixture of the wild-type allele and a mutant allele. 

The PCR products are first denatured and allowed 
to reanneal, creating some homoduplex DNA, in which 
the strands are fully complementary if derived from the 
same allele, and some heteroduplex DNA (Figure 16.15b). 
Heteroduplex DNA is composed of strands that are largely 
complementary but contain one or more mismatched base 
pairs, indicating that the strands are derived from DNA 
containing different alleles. Heteroduplex DNA can be dis- 
tinguished from homoduplex DNA by either a difference in 
migration of the products during electrophoresis or by dif- 
ferential susceptibility to an endonuclease that cleaves het- 
eroduplex DNA at mismatched base pairs. Heteroduplex 
DNA forms only in DNA samples in which a mutation in 
the target gene is present. Screening progeny from sev- 
eral thousand mutagenized individuals often allows iden- 
tification of multiple mutant alleles of the target gene. 
Individuals homozygous or heterozygous for the mutant 
allele can then be identified in the appropriate M, family. 

When chemical mutagenesis is used to produce 
TILLING alleles, it results in both null alleles and partial 
loss-of-function alleles. The spectrum of phenotypes pro- 
duced by alleles obtained through TILLING approaches is 
often useful for dissecting gene function, even in organisms 
where gene knockouts are available. While TILLING was 
developed for studies in model genetic species, it is suitable 
for any organism that can be mutagenized and genetically 
analyzed. It is currently being applied to several crop plants. 

Genetic Analysis 16.2 tests your understanding of the 
reverse genetics analytical techniques discussed in this 
section. 


16.4 Transgenes Provide a Means 
of Dissecting Gene Function 


Transgenes have other uses in the study of gene func- 
tion, in addition to the creation of loss-of-function al- 
leles. Chimeric genes, transgenes composed of regulatory 
sequences from one gene and coding sequences from a 
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Figure 16.15 Reverse genetics by TILLING. 


GENETIC ANALYSIS 
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PROBLEM In searching the mouse genome, you identify the sequences of three genes similar to 
the single hedgehog gene of Drosophila: Sonic hedgehog, Indian hedgehog, and Desert hedgehog. 
Describe the research design you would use to learn the function of each of the genes and whether 
that gene function isunique or redundant in the mouse. 


BREAK IT DOWN: When genes 
in different species are highly similar, 
they are likely to have originated from 
a single ancestral gene in a common 
ancestor. 


sequences and wish to know gene functions. Which ge- 
netics approach, forward or reverse, is most appropriate? 


Solution Strategies Solution Steps 


Evaluate 


ce IT DOWN: You are starting with gene | 


1. Identify the topic this problem addresses 
and the nature of the required answer. 


1. This problem is about designing research to identify the functions of genes 
known only by sequence and to discover whether those functions are unique 
or redundant. 


2. Identify the critical information given in 2. While only one hedgehog gene exists in Drosophila, three ‘hedgehog’ gene 


the problem. sequences exist in mouse, raising the question of whether the three mouse 
genes have different functions or whether there is any sharing of function. 
TIP: Reverse genetics approaches can be used 
Deduce for functional analysis. 


3. Consider possible approaches 
to discovering the functions 
of genes known only by sequence. 


4. Consider possible approaches to reverse 4. 
genetics available for use with mice. 


TIP: Consider the methods discussed 
to create mutations in mice. 


3. Functions of genes known only by sequence can be determined by reverse 
genetics approaches. 


Homologous recombination approaches can be used to produce loss- 
of-function mutations in mice. Other reverse genetics approaches, such as 
RNAi, could also be used, but homologous recombination is the preferred 
method, as it results in null alleles. 


Solve 


5. Describe a genetics approach to 5, 
determine whether the genes have 
unique or redundant functions. 


First, create loss-of-function knockout alleles of each of the three genes 
by homologous recombination. Homozygous mutant lines can then be 
bred and the phenotypes of each of the three single knockouts examined. 


Interbreeding the single-mutant lines will lead to the creation of strains 
in which combinations of two or more genes are inactive. Comparison of 
phenotypes of single mutants with those of multiple mutants allows an 
assessment of whether the genes exhibit unique or redundant functions. 


For more practice, see Problems 9, 10, and 22. 


second gene or coding sequences from two different genes, 
provide a means to create gain-of-function alleles, as well as 
to monitor gene expression patterns. This section describes 
in greater detail the ways transgenes can reveal genetic 
function. 

While an almost limitless array of transgenes can be 
constructed for genetic analysis, many fall into two cat- 
egories. One category consists of reporter genes, used to 
investigate gene regulation because they produce a visual 
output of gene expression patterns. Fusion of the regula- 
tory sequences of a gene of interest to coding sequences of 
a reporter gene provides information about where, when, 
and how much a gene is expressed. Some reporter genes 
facilitate live imaging and monitoring of gene expression 
in real time. 

The second category of transgenes useful for genetic 
analysis consists of gain-of-function alleles generated by 
placing coding regions from one gene under control of 
regulatory sequences derived from another gene. An allele 


556 


Visit the Study Area to access study tools. 


MasteringGenetics™ 


constructed in this way often results in ectopic expres- 
sion, expression occurring at times or in places where the 
gene is not normally expressed. The use of either or both 
of these types of transgenes can complement analyses of 
loss-of-function alleles by providing information on how 
genes are normally expressed and the phenotypic conse- 
quences of changing their normal expression pattern. 


Monitoring Gene Expression with Reporter 
Genes 


A gene can act as a reporter if its product can be detected 
directly or is an enzyme that produces a detectable product. 
The regulatory sequences of the gene of interest are used 
to drive the expression of the reporter gene. Two types of 
reporter gene fusions can be constructed: transcriptional 
and translational (Figure 16.16). 

In a transcriptional fusion, regulatory sequences di- 
recting transcription of the gene of interest are fused 
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directly with the coding sequences of the reporter gene. 
In this case, the reporter gene will be transcribed in the 
pattern directed by the regulatory sequences to which 
it is fused. Note that the transcriptional fusion shown 
in Figure 16.6 is idealized and that regulatory sequences 
may reside in other regions in addition to the 5' upstream 
sequences. In translational fusion, not only the regulatory 
sequences but also the coding sequence of the gene of in- 
terest are fused to the reporter gene in such a way that the 
reading frame for translation is maintained for both the 
gene of interest and the reporter gene. As a result, the re- 
porter protein is translationally fused with the protein of 
interest, and the location of the reporter protein provides 
information not only on the spatial and temporal tran- 
scriptional expression pattern but also on the subcellular 
location of the fusion protein. In translational fusions, 
care must be taken to find out if the fusion protein is 
still functional, since the addition of the reporter protein 
could interfere with the proper folding or activity of the 
protein of interest. 

Some frequently used reporter genes are represented 
in Figure 16.17. The choice of reporter gene depends 
on the biological question being addressed. With some 
reporter genes, the assay to monitor gene expression re- 
quires sacrificing the organism, whereas the expression 
of other reporter genes can be traced in a living organism. 
Reporter gene products sometimes require substrates that 
must penetrate into the tissues or cells where the reporter 
genes are expressed. In addition, reporter genes vary in 
their sensitivity. 

One of the first reporter genes to be developed 
emerged from research on the lac operon in E. coli 
(see Section 14.3). To purify and study the activity of 
B-galactosidase, encoded by the lacZ gene, a number of 
B-galactosides were synthesized and tested as substrates. 
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Figure 16.16 Transcriptional 
versus translational gene fusions. 
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Two ß-galactosides, abbreviated X-gal and ONPG, were 
found to be useful. B-galactosidase cleaves the colorless 
substrate, ONPG, into a yellow product. This assay is 
typically used for in vitro measurement of f-galactosidase 
activity. In contrast, X-gal, also colorless, is cleaved by 
B-galactosidase into a blue product. This assay can be 
used in bacteria in vivo, since bacterial cells can take up 
the X-gal substrate without a reduction in viability. 

The lacZ gene can be used in conjunction with the 
substrate X-gal as a reporter gene in animal systems 
(Figure 16.17a). However, since plants have an endog- 
enous {-galactosidase activity, lacZ is not suitable for 
studying plant systems. An alternative option is the E. coli 
uidA gene encoding f-glucuronidase, which enzymati- 
cally cleaves a colorless precursor, X-gluc, into a blue 
product (Figure 16.17b). Conversely, since animals have 
endogenous $-glucuronidase activity, the uidA gene can- 
not be used as a reporter in animals. A limitation of both 
of these reporter genes in organisms other than bacteria 
is that in order for the substrate to be taken up effectively 
into internal tissues, the tissue to be stained must be 
bathed in a solution that kills the cells. 

Research into reactions that cause the natural emis- 
sion of light in some animals has led to the development 
of reporter genes that cause light to be produced in liv- 
ing cells. For example, luciferase, the enzyme responsible 
for the glow of fireflies, catalyzes a reaction between the 
substrate luciferin and ATP that results in the emission 
of light. Transgenic plants expressing the luciferase gene 
will emit a yellow-green glow if supplied with the substrate 
(Figure 16.17c). However, luciferin is not delivered to all 
cells of the plant in equal measure, which in many cases lim- 
its the usefulness of the luciferase gene as a reporter. 

The development of green fluorescent protein 
(GFP) led to great strides both in genetics and cell 
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(a) Lin-3 regulatory sequences 
driving lacZ reporter gene 
in C elegans 


Arabidopsis 


(d) RHODOPSIN regulatory sequences driving 
GFP reporter gene in Mus musculus 


Individual rod cells 


Reporter genes. 


biology by providing a noninvasive means of visualizing 
gene and protein expression patterns in living organisms 
(Figure 16.17d). The GFP gene, derived from the jellyfish 
Aequoria victoria, is the source of the natural biolumi- 
nescence of this species. Its wild-type protein product, 
consisting of 238 amino acids, fluoresces green (a 509-nm 
wavelength) when illuminated with UV light (a 395-nm 
wavelength), which in this case is the “substrate,” deliv- 
ered by laser. 

Because UV light, with its short wavelength, can be 
harmful to organisms (e.g., causing thymidine dimers to 
form in DNA, as described in Section 12.3), the wild-type 
GFP gene was mutated to produce variants that respond 
to lower-energy wavelengths. A major improvement was a 
mutation that shifted the excitation wavelength to 488 nm, 
corresponding to blue light and minimizing the potential 
damage to cells being illuminated. Subsequent modifica- 
tion of the GFP protein sequence has led to the produc- 
tion of variants that emit other colors (e.g., yellow, cyan, 
blue). Genes encoding fluorescent reporter proteins have 
also been isolated from marine corals and other jellyfish. 


(b) PHABULOSA regulatory sequences 
driving uidA reporter gene in 


(c) CaMV 35S regulatory 
sequences driving luciferase 
reporter gene in tobacco 


(e) Mus musculus neurons expressing three different 
fluorescent reporter genes, derived from modifying GFP 


The availability of multiple fluorescent reporter makes 
it possible to visualize the expression of several genes si- 
multaneously in a single organism (Figure 16.17e). Osamu 
Shimomura, Martin Chalfie, and Roger Y. Tsien received 
the 2008 Nobel Prize in Chemistry for their discovery and 
development of GFP. 

Reporter genes can be used to dissect regulatory 
DNA sequences and identify specific sequences required 
for particular aspects of gene regulation. The general 
approach is to start with a clone in which all the regula- 
tory sequences required for proper gene expression are 
present and then to assay the effects of deleting or chang- 
ing specific portions of the clone. An example of such an 
analysis of the Drosophila even-skipped (eve) gene, which 
is expressed in seven stripes in the segmentation pattern 
of the embryo, is shown in . Overlapping 
deletions spanning large regions are assayed first. Then 
regions identified as important for gene regulation are 
dissected with smaller deletions. The concept is simi- 
lar to that described earlier for deletion mapping (see 
Sections 6.6 and 13.3). When specific sequences required 
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Figure 16.18 Use of reporter gene in promoter analysis of the even-skipped (eve) gene. 


for proper gene expression are deleted, expression of the 
reporter gene will be correspondingly altered. 

If genomic sequence is available from two or more 
related species, regulatory elements may be predicted 
by searching for sequences that are conserved between 
the related species, using a method known as phyloge- 
netic footprinting (discussed in Chapter 18). Such initial 
genomic sequence analyses can direct subsequent experi- 
mental tests that use reporter genes to analyze expression 
in transgenic organisms. 


Enhancer Trapping 


Enhancer trapping uses a variation of an insertional 
library to identify genes based on expression patterns. 
This approach combines the generation of a large num- 
ber of random insertion mutants with the expression of 
a reporter gene (Figure 16.19). In its simplest application, 
a population of transgenic organisms is generated by 
random insertion of a transposon (or T-DNA) contain- 
ing the coding sequence of a reporter gene fused with a 
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If enhancer trap disrupts coding region of gene, a loss-of-function allele 
is created. However, insertion of vector may occur 5’or 3’ to a gene and 
still “trap” enhancers without causing a loss-of-function mutation. 


(b) | Three patterns of gene 
expression in Drosophila 
embryos seen in 
enhancer trap lines using 
B-galactosidase as 
a reporter gene. 


Figure 16.19 Enhancer trapping to reveal expression 
patterns of endogenous genes. (a) Strategy for generation 
of enhancer trap lines. (b) Examples of enhancer trap lines in 
Drosophila. 


minimal promoter for RNA polymerase II transcription. If 
the insertion occurs near enhancer or silencer regulatory 
sequences that can act in conjunction with the minimal 
promoter of the reporter gene, the reporter can be ex- 
pressed in a pattern that reflects the regulatory capability 
of the nearby genomic DNA sequences. The enhancers 
(or silencers) of the adjacent genomic DNA are co-opted, 
or “trapped,” by the insertion to drive expression of the 
reporter gene. Thus, from the expression patterns of the 
inserted reporter gene, researchers can infer the existence 
of regulatory sequences, presumably from adjacent genes, 
that drive gene expression in the observed patterns. While 
reporter gene expression may not precisely reflect the 
expression of the adjacent gene, the expression of the re- 
porter often at least partially reflects the normal gene ex- 
pression pattern of the adjacent gene. Enhancer trapping 
techniques were first pioneered in Drosophila and have 
now been adapted to other systems. Because they identify 
genes by gene expression patterns, enhancer trapping 
techniques complement forward genetic screens. 


Investigating Gene Function 
with Chimeric Genes 


A chimeric gene is one in which regulatory and coding 
sequences derived from two or more different genes are 
recombined in a novel manner. For example, combining 
the regulatory sequences from one gene with the coding 
sequences from another gene often results in a gain-of- 
function allele due to ectopic expression of the gene rep- 
resented by the coding sequences. 

Figure 16.20 shows one way experimenters can take 
advantage of this potential to obtain information on 
gene function. Recessive loss-of-function mutations in 
the eyeless gene of Drosophila result in a failure of eyes to 
develop. The eyeless gene is normally expressed only in 
the eye imaginal discs during Drosophila development. 
Imaginal discs are groups of precursor cells that are set 
aside during embryonic development. These grow by mi- 
totic proliferation during larval life and later differentiate 
into adult body tissues during metamorphosis. A gain- 
of-function eyeless allele can be created by constructing 
a chimeric gene in which expression of the eyeless coding 
sequences is driven by regulatory sequences active in all 
imaginal discs. If the eyeless gene is ectopically expressed 
in non-eye imaginal discs, such as those that would nor- 
mally give rise to the antennae or legs, the imaginal discs 
will differentiate as eye tissue instead. This outcome 
indicates that cells in any imaginal disc are capable of dif- 
ferentiating into eyes and that the eyeless gene product 
can promote the development of eyes from any imaginal 
disc. Thus, when the eyeless allele is ectopically expressed 
as a gain-of-function mutation in inappropriate imaginal 
discs, the resulting phenotype is the converse of the phe- 
notype of the loss-of-function eyeless allele—ectopic eyes 
as opposed to an absence of eyes. 
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Comparison of loss- and gain-of-function alleles. 


In cases where the gain-of-function and loss-of- 
function phenotypes are complementary, interpretation 
of the effects of ectopic expression is straightforward. 
Thus, in the preceding example, eyeless is revealed to be 
a master control gene for the differentiation of eyes in 
Drosophila. However, ectopic expression of genes can 
also lead to enigmatic phenotypes that are more difficult 
to interpret. For example, ectopic expression of eyeless 


In this case study, we see an example of how forward genet- 
ics and reverse genetics work together to provide a broader 
view of both gene function and evolution. The story begins 
with forward genetics—the isolation of a mutant that alters 
flower development and the subsequent identification of 
the mutant gene sequence using recombinant DNA tech- 
nology. The gene is then cloned and used as a probe for 
cloning genes of similar sequence. Finally, reverse genetics 


during embryogenesis leads to embryonic lethality, 
a phenotype that is not easily reconciled with the loss- 
of-function phenotype. Therefore, when considering 
gain-of-function alleles generated by ectopic expression, 
we must remember that the phenotypes represent what 
the gene is capable of doing when expressed in particu- 
lar contexts and may not reflect the normal function of 
the gene. 


neni 


approaches are applied to identify mutant alleles of related 
genes, and their biological function is inferred based on the 
mutant phenotypes. 


ORWARD | 
In flowering plants, the types of floral organs that develop 
are decided by the expression of a set of transcription factors. 
(For further description of this activity, see Chapter 20.) 


The identity of Arabidopsis reproductive organs (stamens and 
carpels) is determined in part by the activity of the AGAMOUS 
gene. Recessive null loss-of-function agamous alleles lead to 
the development of petals in the positions usually occupied 
by stamens and of an additional flower in the position usually 


occupied by carpels. Homozygotes are sterile and do not 
produce gametes (hence the name AGAMOUS). In forward 
genetic screens aimed at identifying genes involved in Arabi- 
dopsis flower development, agamous mutant alleles induced by 
either EMS or T-DNA have been isolated (Figure 16.21, step @). 
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(6) Identify mutations in the related genes SEP1, SEP2, and SEP3 
using reverse genetic approaches (e.g., screening knockout libraries 
of T-DNA and transposon mutant lines). 


(7) Combine null mutations in each of the three genes by crossing 
mutants and breeding lines homozygous for mutations in all three 
genes. Analyze the phenotype of the triple null mutant. 


Figure 16.21 Use of forward and reverse genetics to determine gene function. 
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The T-DNA-induced allele proved a useful tool for cloning 

the AGAMOUS gene because the T-DNA “tagged” the gene (step 
©). The approach is similar to that described for transposon 

tagging in Section 16.2: First, a genomic library is constructed 
from DNA isolated from agamous mutants (see Section 16.2 
for construction and screening of genomic libraries). Then the 
genomic library is screened with a probe consisting of T-DNA 
sequence. The probe identifies genomic clones in the library 
that have T-DNA sequence. Since the T-DNA was inserted into 
the AGAMOUS gene, Arabidopsis DNA adjacent to the T-DNA 
sequences encodes the AGAMOUS gene. 

Subsequently, the genomic clone encoding AGAMOUS 
can be used to identify an AGAMOUS cDNA clone from a library 
constructed with mRNA from wild-type flowers @. Sequencing 
of the AGAMOUS cDNA clones reveals that the encoded protein 
has a similarity to known eukaryotic transcription factors. This 
conclusion is based on the similarity between a 60-amino acid 
domain of the AGAMOUS protein and DNA-binding domains in 
yeast and mammalian transcription factors. 


IDENTIFICATION OF HOMOLOGOUS GENES When the 
AGAMOUS cDNA is used to probe a Southern blot of restriction- 
enzyme-digested Arabidopsis genomic DNA, sequences 


related to the AGAMOUS gene sequence can be identified @ 


(see Section 10.2 to review Southern blotting). The same AGA- 
MOUS cDNA can be used as a probe on the flower cDNA library 
to identify clones of related genes. Genes related to AGAMOUS 
were called AGAMOUS-LIKE, or AGL, genes. These related genes 
possess the same highly conserved DNA-binding domain but 
differ in the rest of their protein sequences. To determine how 
the AGL genes are related to AGAMOUS and to each other, a 
phylogenetic tree can be constructed @ (see Section 1.4 to 
review phylogenetic trees). 


SUMMARY 


16.1 Forward Genetic Screens Identify Genes 
by Their Mutant Phenotypes 


Forward genetic screens are designed to identify genes by 
creation of a mutant phenotype, often allowing researchers 
to infer the biological function of a gene. 

Complementation tests are used to discover the number of 
alleles and the number of genes affected in a forward genetic 
screen. 

Mutations resulting in lethality can be identified in genetic 
screens for conditional alleles. 

Enhancer and suppressor genetic screens identify genes that 
act in related or redundant pathways. 


16.2 Genes Identified by Mutant Phenotype 
Are Cloned Using Recombinant DNA Technology 


Some genes can be cloned by complementation of a mutant 
phenotype. 

Transposons and other integrating elements can be used to 
tag genes, facilitating their subsequent cloning. 


( MasteringGenetics™ 
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REVERSE GENETICS REVEALS FUNCTIONS OF HOMOL- 
OGOUS GENES Since the related genes are known by gene 
sequence only, a reverse genetics approach can be undertaken 
to determine gene function. Transposon- or T-DNA-induced 
mutant alleles of many of the AGL genes in Arabidopsis can be 
identified in available knockout libraries @ (see Section 16.3). 
Researchers were initially surprised that plants homozygous 
for loss-of-function alleles of many single genes did not dis- 
play an aberrant phenotype. Hypothesizing that the more 
closely related the genes, the more similar their functions 
would be, researchers crossed mutants to obtain organisms 
containing multiple loss-of-function alleles of closely related 
genes @. For example, sep? mutants—having mutations of 
the SEPALLATA1 gene—were crossed with sep2 mutants, after 
which sep1 sep2 double mutants were identified in the F gen- 
eration. Disappointingly, the sep7 sep2 double mutants did not 
differ significantly from wild-type plants. However, sep7 sep2 
sep3 triple mutant plants proved to have flowers consisting 
solely of sepals, which indicates that these genes have a func- 
tion related to floral organ specification but distinct from the 
role of AGAMOUS. 

Genetic redundancy due to gene duplications is extensive 
in most eukaryotic genomes (see Chapter 18). Immediately fol- 
lowing an occurrence of gene duplication, the duplicate genes 
often have identical DNA sequences and expression patterns, 
and they are therefore genetically redundant. Over time, how- 
ever, the functions of the two genes may diverge due to the 
accumulation of mutations that lead to changes in protein 
sequence and expression pattern. Yet, since the genes are 
evolutionarily related, they often function in similar biologi- 
cal processes. Reverse genetics approaches can facilitate the 
analysis of closely related genetically redundant genes. 


For activities, animations, and review quizzes, go to the Study Area. 


Positional cloning, or chromosome walking, provides 

a means of identifying cloned genes known only from 

a mutant phenotype. 

Positional cloning approaches proceed by first mapping 
mutations and then constructing contigs of DNA that span 
the target gene. The target gene can be identified by expres- 
sion analyses, DNA sequence analyses, or complementation 
experiments. 

Advances in sequencing technologies facilitate direct 
identification of mutant genes. 


16.3 Reverse Genetics Investigates Gene Action by 
Progressing from Gene Identification to Phenotype 


| Reverse genetics approaches, in which determination of 
biological function proceeds from gene sequence to mutant 
phenotype, make use of collections consisting of mutants 
that are each defective in a different defined gene. 
Collections of insertion alleles, the TILLING process, and 
RNAi-mediated gene silencing all contribute to the reverse 
genetics analysis of model organisms. 
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regulatory sequences. Some reporter genes, such as the 
green fluorescent protein, can be visualized in real time in 
living organisms. 


16.4 Transgenes Provide a Means of Dissecting 
Gene Function 


E Reporter genes are used to monitor gene-expression 


E Chimeric genes represent novel alleles that provide clues to 
patterns in transgenic organisms and for the dissection of 


gene function. 
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1. What are the advantages and disadvantages of using GFP they be identical? What about two independently con- 
versus lacZ as a reporter gene in mice, C. elegans, and structed physical maps? 
Drosophila? c. How can the information in genetic and physical maps 


se F be combined? 
2. A transcriptional fusion of regulatory sequences of a 


particular gene with a reporter gene results in relatively 4. Using the data inside the back cover of the book, calculate 
the average number of kilobase (kb) pairs per centimorgan 
in the six multicellular eukaryotic organisms. How would 
this information influence strategies to positionally clone 
genes in these organisms? 


uniform expression of the reporter gene in all cells of an 
organism, whereas a translational fusion with the same 
gene shows reporter gene expression only in the nucleus of 
a specific cell type. Discuss some biological causes for the 


difference in expression patterns of the two transgenes. 5. What are the advantages and disadvantages of using in- 


sertion alleles versus alleles generated by chemicals (via 
TILLING) in reverse genetic studies? 


3. Genetic maps and physical maps are both representations 
of a genome. 


a. What are the similarities and differences between how 6. You have cloned the mouse ortholog of the gene associated 


genetic and physical maps are created? 
b. If genetic maps of a particular organism are indepen- 
dently constructed in two different laboratories, will 


Application and Integration 


7. The CBF genes of Arabidopsis are induced by exposure of 


the plants to low temperature. 


a. How would you examine the temporal and spatial pat- 
terns of expression after induction by low temperature? 


b. Can you design a method that would indicate these 
changes in gene expression in a way that a farmer 
could recognize them by observing plants growing in 
the field? 


with human Huntington Disease (HD) and wish to ex- 
amine its expression in mice. Outline the approaches you 
might take to examine the temporal and spatial expression 
pattern at the cellular level. 


For answers to selected even-numbered problems, see Appendix: Answers. 


When the S. cerevisiae genome was sequenced, only about 

40% of its predicted genes had been previously identified in 

forward genetic screens. This left about 60% of predicted 

genes with no known function, leading some to dub the 

genes fun (function unknown) genes. 

a. As an approach to understanding the function of a 
certain fun gene, you wish to create a loss-of-function 
allele. How will you accomplish this? 


10. 


11. 


12. 


13. 


14. 


15. 


b. You wish to know the physical location of the 
encoded protein product. How will you ascertain such 
information? 


Translational fusions between a protein of interest and 

a reporter protein are used to determine the subcellular 
location of proteins in vivo. However, fusion to a reporter 
protein sometimes renders the protein of interest non- 
functional because the addition of the reporter protein 
interferes with proper protein folding, enzymatic activity, 
or protein-protein interactions. You have constructed 

a fusion between your protein of interest and a reporter 
gene. How will you show that the fusion protein retains its 
normal biological function? 


In enhancer trapping experiments, a minimal promoter 
and a reporter gene are placed adjacent to the end ofa 
transposon so that genomic enhancers adjacent to the in- 


sertion site can act to drive expression of the reporter gene. 


In a modification of this approach, a series of enhancers 
and a promoter can be placed at the end of a transposon 
so that transcription is activated from the transposon into 
adjacent genomic DNA. What types of mutations do you 
expect to be induced by such a transposon in a mutagen- 
esis experiment? 


In Genetic Analysis 16.1, we designed a screen to identify 

conditional mutants of S. cerevisiae in which the secretory 

system was defective. Suppose we were successful in iden- 
tifying 12 mutants. 

a. Describe the crosses you would perform to determine 
the number of different genes represented by the 
12 mutations. 

b. Based on your knowledge of the genetic tools for study- 
ing baker’s yeast, how would you clone the genes that 
are mutated in your respective yeast strains? What are 
two approaches to cloning the human orthologs of the 
yeast genes? 


How would you design a genetic screen to find genes 
involved in meiosis? 


The eyes of Drosophila develop from imaginal discs, 
groups of cells set aside in the fly embryo that differentiate 
into the adult structures during the pupal stage. Despite 
their importance in nature, eyes are dispensable for fruit- 
fly life in the laboratory. 
a. Devise a genetic screen to identify genes directing 
development of the fly eye. 
b. What complications might arise from genetic 
screens targeting an organ that differentiates late in 
development? 


Given your knowledge of the genetic tools for studying 
Drosophila, outline two methods by which you could 
clone the dunce and rutabaga genes identified by Seymour 
Benzer’s laboratory in the genetic screen described at the 
beginning of this chapter. 


Mutations in the CFTR gene result in cystic fibrosis in 
humans, a condition in which abnormal secretions are 


16. 


17. 


18. 


19. 
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present in the lungs, pancreas, and sweat glands. In the 

effort to positionally clone the CFTR gene, the gene was 

mapped to a region of 500 kb on chromosome 7 containing 
three candidate genes. 

a. Using your knowledge of the disease symptoms, how 
would you distinguish between the candidate genes to 
decide which is most likely to encode the CFTR gene? 

b. How would you prove that your chosen candidate is the 
CFTR gene? 


You have cloned the cDNA for the CFTR gene (see 

Problem 15). You have used the cDNA, which is 4.5 kb 

in length, to identify a 250-kb BAC clone from a genomic 

library that fully contains the CFTR gene. 

a. Describe the strategies that you will use to sequence 
each of these clones. 

b. You assume that the vast majority of the disease- 
causing mutations in this gene are within exons or at 
intron—exon boundaries. If you are correct, how might 
you identify mutations in patients while using a mini- 
mum amount of sequencing? 


How would you devise a screen to identify recessive 
mutations in Drosophila that result in embryo lethal- 
ity? How would you propagate the recessive mutant 
alleles? 


In land plants, there is an alternation of generations 
between a haploid gametophyte generation and a diploid 
sporophytic generation. Both generations are typically 
multicellular and may be free-living. The male (pollen) and 
female (embryo sac) gametophytes are the haploid genera- 
tion of flowering plants. 


a. How would you devise a screen to identify genes 
required for female gametophyte development in 
Arabidopsis? 

b. How would you devise a screen to identify genes 
required for male gametophyte development? 


The Drosophila even-skipped (eve) gene is expressed in 
seven stripes in the segmentation pattern of the embryo. 
A sequence segment of 8 kb 5' to the transcription start 
site (shown as +1 in the figure on page 559) is required to 
drive expression of a reporter gene (/acZ) in the same pat- 
tern as the endogenous eve gene. Remarkably, expression 
of each of the seven stripes appears to be specified inde- 
pendently, with stripe 2 expression directed by regulatory 
sequences in the region 1.7 kb 5' to the transcription start 
site. To further examine stripe 2 r egulatory sequences, 
you create a series of constructs, each containing different 
fragments of the 1.7-kb region of 5' sequence. In the lower 
part of the figure, the bars at left represent the sequences 
of DNA included in your reporter gene constructs, and 
the + and — signs at right indicate whether the corre- 
sponding eve:lacZ reporter gene directs stripe 2 expression 
in Drosophila embryos transformed through P element 
mediation. How would you interpret the results—that is, 
where do the regulatory sequences responsible for stripe 2 
expression reside? 
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CHAPTER 16 Analysis of Gene Function by Forward Genetics and Reverse Genetics 


Most organisms display a circadian rhythm, in which 

biological processes are synchronized with day length 

(e.g., in humans, rapid movement between time zones 

results in jet lag, in which established circadian rhythms 

are out of synch with daylight hours). In Drosophila, pupae 
eclose (emerge as adults after metamorphosis) at dawn. 

a. Using this knowledge, how would you screen for 
Drosophila mutants that have an impaired circadian 
rhythm? 

b. In most plants, such as Arabidopsis, genes whose 
encoded products have roles related to photosynthe- 
sis have expression patterns that vary in a circadian 
manner. Using this knowledge, how would you screen 
for Arabidopsis mutants that have an impaired circa- 
dian rhythm? 

c. In each case, how would you clone the genes you identi- 
fied by mutation? 


As shown in Figure 16.1, mutations in the Drosophila 
Ultrabithorax (Ubx) gene result in wings developing from 
two thoracic segments rather than just one as in wild-type 
flies. In the mouse genome there are three L/bx orthologs. 
How would you determine whether the three mouse genes 
have distinct or redundant functions? 
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Produces Genetically Identical 
Individuals 


ESSENTIAL IDEAS 


E DNA can be amplified by either molecular cloning 
or the polymerase chain reaction. 


Transgenic E. coli expressing the genes for the carotenoid biosynthetic 
pathway, derived from plants. Carotenoid pigments, responsible for the 
red and orange colors of tomatoes, peppers, and oranges, act as a buf- 
fer system to absorb excess electrons and radicals produced during 


photosynthesis. In molecular cloning, DNA fragments are ligated 


into a cloning vector, which in turn is replicated in 
alive host. 


Libraries are collections of clones of DNA 
| he advent of recombinant DNA technology for recom- fragments, derived from the DNA or mRNA 


bining, copying, and analyzing genetic sequences isolageufromicelB oraniorganism: 


Transgenic organisms are created by harness- 
ing biological vectors to introduce genes into 
level. This aspect of genetic exploration began with a set organisms. 


of basic strategies for the in vitro manipulation of DNA and Recombinant DNA technology in humans is a 


for identifying the sequence of any given gene. The next pathway to the development of gene therapy. 
Cloning of plants and animals produces 
genetically identical individuals. 


opened the way to studying gene function at the molecular 


step after that achievement was to invent methods for the 
precise manipulation of gene action in living organisms. 
One of the central technical developments propelling 
the latter advance was development of the ability to cre- 
ate transgenic organisms—organisms that have had genes 
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from other organisms inserted into their genomes. 
The methodology, now routine in genetic analy- 
sis, can be adapted to an almost limitless number 
of experimental approaches. It is a powerful tool 
for manipulating the activity of specific genes, ob- 
serving the resultant phenotypes, and in this way 
acquiring new insight into biological processes. In 
addition, transgenic organisms can be fashioned for 
specific medical, agricultural, or industrial purposes. 

Collectively, the techniques of recombinant DNA 
technology have permitted the sequencing of the 
entire genomes of many species, including our own, 
providing an unprecedented view of life. Increasingly 
sophisticated techniques have enabled both in vitro 
and in vivo manipulation of DNA sequences, shed- 
ding light on the molecular basis for development 
and physiology and for genetic variation both within 
and between species. If used wisely, this knowledge 
can be applied to better the human condition as well 
as that of the planet. 

In this chapter, we discuss these applications 
of recombinant DNA technology, focusing on the 
methods used to create transgenic organisms and 
manipulate gene activity. The discussions in the 
present chapter furnish the nuts-and-bolts details 
of how reverse genetics is accomplished in different 
model organisms. 


17.1 Specific DNA Sequences Are 
Identified and Manipulated Using 
Recombinant DNA Technology 


Recombinant DNA technology is the set of techniques 
developed for amplifying, maintaining, and manipulating 
specific DNA sequences in vitro and also in vivo. This 
technology, which is based on advances in microbiology— 
particularly in understanding the life cycles of bacteria 
and their viruses, the bacteriophages—has revolutionized 
the study of genetics. With the ultimate goal of studying 
specific genes and their functions, biologists use recom- 
binant DNA techniques to (1) fragment DNA into easily 
managed pieces and then separate and purify these frag- 
ments; (2) create many copies of DNA molecules of iden- 
tical sequence; (3) combine DNA fragments to construct 
chimeric, or recombinant, DNA molecules; (4) determine 


the exact sequence of specific DNA molecules; (5) identify 
fragments of DNA containing complementary sequences; 
(6) introduce specific DNA molecules into living organ- 
isms; and (7) assay the phenotypic effects of the intro- 
duced DNA. 

The major challenges of recombinant DNA technol- 
ogy are the identification of specific DNA sequences and 
their manipulation in vitro. To see these challenges in 
perspective, consider that each of your cells contains two 
copies each of 22 autosomes and 2 sex chromosomes. 
Collectively, a haploid set of 23 chromosomes contains 
3 billion base pairs and carry some 22,800 or so genes. 
A typical gene encodes an mRNA transcript consisting of 
a few thousand bases, although the mRNA may be tran- 
scribed from a region that spans millions of base pairs. 
Molecular analysis of genes and of allelic variation is pos- 
sible only by distinguishing a gene of interest from others 
in the genome. 

Recombinant DNA technology allows researchers to 
divide the genome into smaller segments that can then 
be analyzed and reassembled to provide a molecular view 
of genes and the genome. In the following sections we 
describe the development of recombinant DNA technol- 
ogy tools and their application to identify specific DNA 
sequences. 


Restriction Enzymes 


Restriction enzymes, which cut DNA at specific se- 
quences, have become a basic tool of recombinant DNA 
technology (see Section 10.2). Each type of restriction 
enzyme recognizes a particular sequence at which it 
cuts both strands of the sugar-phosphate backbone of 
the DNA, cleaving the restriction sequence in the same 
way each time it is encountered. Restriction enzymes 
were originally discovered in bacterial cells, where they 
protect the bacteria from invasions of nucleic acids, 
such as the injected genomes of bacteriophages, by 
digesting foreign DNA. They were given the name 
restriction enzymes because they restrict the growth 
of the bacteriophages. Bacterial cells also contain 
restriction-modification systems, which modify the 
restriction sequences in the bacterial DNA by the ad- 
dition of methyl groups and thus protect the bacteria’s 
own DNA from being digested by endogenous restric- 
tion enzymes. Experimental Insight 17.1 explains how re- 
striction enzymes and restriction-modification systems 
were identified and how they became an indispensable 
part of molecular biology. 

Restriction enzymes are common in bacteria. The 
names given these enzymes are generally derived from 
the first letter of the bacterial genus and first two letters 
of the species moniker, followed by a Roman numeral. 
For example, EcoRI is derived from Escherichia coli; the 
letter R denotes the strain from which the enzyme was 
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Experimental Insight 17.1 


From Bacteriophage to Restriction Enzymes: 


Basic Research Spawned a Biological Revolution 


Basic biological research aims to discover and understand 
phenomena from every part of the spectrum of life. Thousands 
of biologists engage in this research every day, and most have 
specialties that may seem obscure or trivial to nonscientists. 
Nevertheless, their discoveries can not only revolutionize re- 
search but affect how we view the world. 

In the mid-1960s, Werner Arber was studying a bacterial 
phenomenon called host-controlled restriction and modification, 
which acts as a simple immune system for bacteria invaded by 
bacteriophages. He showed that E. coli produces two enzymes 
that affect the same short palindromic DNA sequences (see 
Section 10.2 for discussion of palindromic sequences). One 
enzyme, called a restriction endonuclease, cleaves DNA at that 
sequence, like a pair of molecular scissors. The second enzyme, 
called a modification enzyme, adds methyl groups (CH3) to DNA, 
thereby preventing restriction endonucleases from binding to 
and cleaving the DNA. 

In 1970, Hamilton Smith extended Arber’s work by study- 
ing a restriction endonuclease from Haemophilus influenzae. 
Smith isolated the restriction endonuclease, now called Hindll, 
and determined that it cleaves at the sequence 


5'-GTPyPuAC- 3’ 
3'-CAPuPyTG- 5’ 


5'-GTPyPuAC- 3’ 
3'-CAPuPyTG- 5’ 


Hindll cleaves both strands of its target sequence between 
the central purine (Pu = A or G) and pyrimidine (Py = T or C), 
leaving blunt ends on either side of the cut (blunt ends are 
discussed on page 574). 

Smith's work on Hindll identified some important character- 
istics of restriction enzymes. First, Hindll cleaves foreign DNA 
into large fragments, but it does not affect H. influenzae DNA. 
This confirmed Arber’s idea that bacterial DNA is protected 


obtained (RY13), and the numeral (I) indicates it was 
the first enzyme identified. EcoRI recognizes the palin- 
dromic sequence 


5'-GAATTC- 3’ 
3'-CTTAAG-5' 


Recall that a palindrome has the same 5’-to-3’ base 
sequence in both of its antiparallel DNA strands. Most 
restriction enzymes recognize palindromic sequences. For 
example, EcoRI cuts the sugar—phosphate bond between 
the G and the adjacent A residues in both strands, and the 
staggered cut results in two products, each ending with a 
four-base, single-stranded sequence: 


AATTC- 3’ 
G-5’ 


5'-G 
3'-CTTAA 


The single-stranded segments at the ends of each 
EcoRI fragment are referred to as sticky ends because they 
can “stick” to a complementary base-pair sequence by hy- 
drogen bonding. Production of sticky ends facilitates the 


from the action of the bacteria’s own restriction enzymes. 
Second, each resulting DNA fragment has the same three base 
pairs at its ends, indicating that cleavage occurs only at the tar- 
get sequence. Smith also discovered that restriction enzymes 
cleave every copy they encounter of their target sequence. 

In 1971, Daniel Nathans pioneered the use of restriction 
endonucleases to address genetic and genomic questions. 
Nathans used Hindll to digest the small genome of the Simian 
virus SV40 and found that 11 DNA fragments were formed. In 
1973, Nathans digested SV40 with two newly discovered restric- 
tion endonucleases. He then used the three sets of restriction 
fragments to create the first restriction map of the SV40 genome, 
by determining the number of restriction sites for each enzyme 
and their order in the genome and assembling the information 
into a map (as demonstrated elsewhere in this chapter). 

By the time Nathans completed his SV40 genome map, 
biologists were already looking for other restriction en- 
zymes. Within 5 years, over 100 more restriction enzymes 
were discovered. Many formed “sticky” ends on digested DNA 
(described on this page), and Paul Berg realized that DNA 
fragments from different organisms could be joined together 
if they had complementary sticky ends. This finding led to his 
creating the first recombinant DNA molecule, in 1975. 

Arber, Smith, and Nathans shared the Nobel Prize in 
Physiology or Medicine in 1978 for their work on restriction 
enzymes, and Berg won the prize in 1980 for the development 
of recombinant DNA. Since then, restriction enzymes have 
become a ubiquitous tool in genetic and genomic research. 
Arber’s initial study of an obscure event in bacteria had 
spawned a revolution as momentous as Watson and Crick’s 
description of DNA structure or Mendel’s description of the 
laws of heredity. 


combining of DNA fragments generated with restriction 
enzymes, and complementary base pairing plays a role in 
almost all recombinant DNA techniques. The principle 
is that if two DNA molecules produced by restriction en- 
zyme digestion have complementary sticky ends, they can 
be combined by complementary base pairing. 

Another enzyme, EcoRI methylase, protects the 
E. coli genome from being itself digested by the EcoRI 
endonuclease. EcoRI methylase does this by adding a 
methyl group to the A adjacent to the T in both strands 
of the DNA. This is the “modification” performed by the 
EcoRI restriction-modification system. 

Hundreds of restriction enzymes have been iso- 
lated from bacteria and are commercially available (see 
Table 10.1). While many restriction enzymes produce 
sticky ends, either with 5’ overhangs (as produced by 
EcoRI) or with 3’ overhangs, some restriction enzymes 
leave blunt ends that lack a single-stranded segment. 
Blunt-ended DNA molecules can also be recombined, by 
techniques discussed later in this chapter (see page 574). 
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Some restriction enzymes recognize 4-bp sequences, 
others recognize sequences of 5 bp, or 6, or 8. The length 
of the recognition sequence influences how frequently a 
given enzyme will cut DNA. If the DNA of an organism 
were to consist of 25% A, 25% T, 25% G, and 25% C and 
the bases were randomly distributed, then a restriction 
enzyme that had a 4-bp recognition sequence would 
be expected to cut the DNA once every 256 bp (1/4 X 
1/4 X 1/4 X 1/4 = 1/256). Likewise, a restriction enzyme 
that recognized a 6-bp sequence would cut the DNA 
once every 4096 bp (1/4°) on average, and a restriction 
enzyme that recognized an 8-bp sequence would cut 
the DNA once every 65,536 bp (1/48) on average. In 
reality, genomes of most organisms do not consist of 
equal amounts of each of the four bases. For example, 
most genomes of multicellular eukaryotes are AT-rich 
(that is, their genomes have a higher content of A and 
T than of G and C), and so restriction enzymes that rec- 
ognize a GC-rich sequence would cut less frequently on 
average than would enzymes that recognize an AT-rich 
sequence. 

Scientists use data from restriction experiments, in- 
cluding the number of restriction sites and the number 
of base pairs between the sites, to create maps of specific 
DNA sequences. These restriction maps provide a foun- 
dation for further manipulation of the DNA fragments— 
for example, by suggesting where to further subdivide 
cloned fragments in order to clone still smaller fragments, 
in a process known as subcloning. 

Let’s use the genome of E. coli lambda phage in an 
example of the restriction mapping process. The DNA 
of the phage genome can be isolated by purifying the 
phage and removing its protein coat. If this is done gen- 
tly, the isolated nucleic acid will be the entire lambda 
chromosome, which is a linear molecule 48,502 bp in 
length. Electrophoresis of the chromosome in an aga- 
rose gel including a fluorescent stain for DNA (see 
Chapter 10) would reveal a single fluorescent 48.5-kb 
band (first lane in Figure 17.1). If the purified lambda 
chromosome is first digested with Apal, two fragments, 
one measuring 10.1 kb and the other 38.4 kb, are gener- 
ated, indicating that Apal must cut the genome once. 
This allows us to begin drawing the restriction map as 
shown below. 


Apal 
| 


10.1 kb 38.4 kb 


If we digest the purified lambda chromosome with Xhol, 
two fragments, one 33.5 kb and one 15 kb, are generated, 
indicating that XhoI must also cut the genome once: 


Xhol 
| 


33.5 kb 15 kb 


Hindlll Hindlll_ Hindili = Hindlll_ Hindili 
(23130) (25157) (27479) (36895) (37459) 
Apal Xbal 
(10090) (24508) Xhol Hindlll 
(33498) (44141) 
| 
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Figure 17.1 Restriction mapping of lambda phage. 


However, two orientations are possible for the Xhol re- 
striction map relative to the Apal restriction map drawn 
above. It could also be drawn as shown below. 
Xhol 
| 
> [E EEE aes 
l 15 kb 33.5 kb l 


To determine which order is correct, we need to perform a 
double digest, in which both enzymes are used simultane- 
ously to cut the lambda genome. This experiment gener- 
ates three pieces: 10.1 kb, 15 kb, and 23.4 kb. Since the 
15-kb XhoI fragment remained intact but the 33.5-kb Xhol 
fragment was cut into two fragments (10.1 kb and 23.4 kb) 
by Apal, we conclude that the map must be: 


Apal Xhol 
| 


À i ] 


L- 10.1 kb 23.4 kb | 15 kb | 


The other possible map can be eliminated as incorrect 
since it would generate fragments of 4.9 kb, 10.1 kb, and 
33.5 kb: 


Apal Xhol 
l l 


A 


— 10.1 kb + 33.5 kb 


4.9kb 


Genetic Analysis 17.1 provides additional practice at con- 
structing a restriction map. 


GENETIC ANALYSIS 


nes. — = = 


Hi 


PROBLEM You have isolated a plasmid from E. coli and wish to 
begin your analysis of it by making a restriction map. Using three 


restriction enzymes, @ BamH1, @ EcoRI, @ Notl, you perform six 


different digestions: single digests using each enzyme alone and 
double digests using each combination of two enzymes. Agarose 
gel electrophoresis of the resulting fragments produces the 


BREAK IT DOWN: A plasmid is a circular DNA 
molecule (Chapter 6, p. 188). Cut once, it becomes 
linear; cut twice, it forms two fragments; and so on. 


060 


000000 


AÁ results shown here. Draw a restriction map of the plasmid. 


BREAK IT DOWN: Gel electrophoresis 
separates linear DNA fragments by their length, 
with the smallest fragments moving farthest 
from the origin of migration (Chapter 10, p. 343). 


Solution Strategies 


Evaluate 

1. Identify the topic this problem 
addresses and the nature of the 
required answer. 

2. Identify the critical information 
given in the problem. 


Solution Steps 


BREAK IT DOWN:A 
restriction map (p. 570) is a 
depiction of the relative positions 
of restriction-enzyme sites (p. 568). 


1. This problem is about restriction mapping and asks you to construct a restriction 
map of a plasmid. 


2. Electrophoresis results are given for three single digests and the three possible 


double-digest combinations. 


Deduce 


G) Identify the sizes of each of the 
fragments of the single digests, 
TIP: Compare the and determine how many 
sizes of fragments inthe | times each enzyme cuts 
the plasmid. 


sample lanes with the 
sizes of the standards. 


Identify the sizes of each of the 
fragments of the double digests. 


© Compare single- and double- 
\ digest results for ae 


and differences. 


TIP: In analyzing double digests, the relative 
position of restriction sites can be determined 
by observing which fragments remain intact and 
which are cut into smaller fragments. 


PITFALL: If two sites are 
very close to one another, 


there will be fewer fragments 
than expected in the double 
digest. 


3. BamHI—A single 7-kb fragment. Since plasmids are circular, BamHI must cut the 
plasmid only once. 


EcoRI—A single 7-kb fragment. One site in the plasmid. 
Notl—Two fragments: 3 kb and 4 kb. Not! must cut the plasmid at two sites. 


4. Notl + BamHl—Three fragments: 3 kb, 2.3 kb, 1.7 kb. 


Notl + EcoRI—Two fragments: 4 kb, 3 kb. 

BamHI + EcoRI—Two fragments: 5.3 kb, 1.7 kb. 
5. Notl + BamHl—Three fragments, with the 3-kb Notl fragment intact, suggesting the 
BamHI site is within the 4-kb Notl fragment. 
Notl + EcoRI—Two fragments, with both the 4-kb and 3-kb Notl fragments intact, 
suggesting the EcoRI site is adjacent to one of the Notl sites. 
BamHI + EcoRI—Two fragments, indicating the two sites are separated by 1.7 kb 
(or 5.3 kb the long way around the plasmid). 


Solve (a) 3 kb (b) 
6. (a) Draw a restriction map with 
Notl sites. (b) Add in the BamHI Non Noi Nod 
site. (c) Add in the EcoRI site. 7kb 
TIP: Drawing of the restriction map 
does not require the three enzymes 
to be examined in any particular 
order, 4kb 4kb 


For more practice, see Problems 16, 18, 19, 20, and 21. 


The EcoRI site must be adjacent to 
one of the Notl sites and is 1.7 kb 
from the BamHI site. The relative 
order of the EcoRI and adjacent 
Notl sites cannot be determined, 
since the resolution of gel 
electrophoresis is not sufficient. 
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Figure 17.2 Restriction-enzyme digestion of genomic DNA. 


To analyze DNA from organisms with large genomes, 
researchers must fragment the genomes into more man- 
ageable pieces. For example, the Physcomitrella patens 
genome consists of 400 million base pairs, and when 
digested with a restriction enzyme like EcoRI that cuts 
on average every 4096 bp, approximately 100,000 differ- 
ent DNA fragments are produced. When this digested 
DNA is electrophoresed through an agarose gel, the frag- 
ments making up the resulting “smear” range from over 
20 kb down to smaller than 100 bp (Figure 17.2). The 
smeared appearance results because, although the en- 
zyme cuts every 4096 bp on average, the distances be- 
tween EcoRI sites will vary due to variation in the genome 
sequence, and the resolving power of agarose gel elec- 
trophoresis is not sufficient to separate all of the dif- 
ferent-sized fragments into discrete bands. This lack of 
resolution is compounded in larger genomes, such as 
ours, where digestion with EcoRI produces approximately 
730,000 pieces (3,000,000,000/4096). 


Molecular Cloning 


After a genome under study has been reduced to smaller 
pieces by restriction enzymes, the individual pieces must be 
reproduced in large amounts—generally, either by molecu- 
lar cloning or by the polymerase chain reaction (PCR)— 
so that each of them can be analyzed in greater detail. 
Molecular cloning arose from discoveries in bacterial enzy- 
mology and utilizes bacteria and their plasmids or phages to 
amplify and propagate specific fragments of DNA. 

In molecular cloning, isolated DNA fragments are 
inserted into a vector, a carrier fragment of DNA with 


attributes that will allow amplification (replication) in a 
biological system. Then the recombinant DNA molecule 
is introduced into a biological system that amplifies the 
DNA, making many identical copies called DNA clones. 
Molecular cloning produces a large quantity of identical 
DNA molecules that can be analyzed by a variety of tech- 
niques, including restriction enzyme analysis and DNA 
sequencing. 
Molecular cloning has three general steps: 


1. The joining together of the cloning vector and a donor 
DNA fragment to produce a recombinant clone 


2. Selection of vectors containing copies of the DNA 
segment of interest 


3. Amplification of the recombinant clone in a biological 
system 


In this section, we describe how DNA fragments are com- 
bined in vitro, the attributes of some common cloning 
vectors, and the means of their amplification. We then 
describe how DNA libraries—collections of cloned DNA 
fragments, usually derived from a single DNA source—are 
constructed. 


Creating Recombinant DNA Molecules One common 
method of producing recombinant DNA is to digest DNA 
from the donor source and DNA of the cloning vector 
with the same restriction enzyme. The resulting linear 
fragments from the two DNA sources can then be annealed 
at their complementary sticky ends. Figure 17.3 illustrates 
restriction digestion by EcoRI of both the vector DNA—a 
plasmid, in this case—and DNA from the human genome. 
Mixing the two DNAs in a test tube allows the sticky 
ends to hybridize to one another by complementary base 
pairing, after which the remaining single-stranded nicks 
are sealed with DNA ligase (see Chapter 7), resulting in a 
recombinant DNA molecule. In this case, a recombinant 
plasmid containing human DNA is formed. 

While it is common to cut both source and vector 
DNA with the same enzyme, variations on this theme are 
frequently employed. For example, two different restric- 
tion enzymes that create complementary sticky ends are 
sometimes used. When different restriction enzymes are 
used to digest vector and donor DNA, complementary 
sticky ends are called cohesive compatible ends. For ex- 
ample, BamHI recognizes the 6-bp sequence 


5'-GGATCC- 3’ 
3'-CCTAGG-5' 


and leaves sticky ends 


GATCC-3’ 
G-5’ 


5'-G 
3'-CCTAG 


Sau3A recognizes the 4-bp sequence 


5'-GATC-3' 
3'-CTAG-5' 
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DNA ligase catalyzes phosphodiester bond formation 
between 5’ phosphate and 3’ hydroxyl groups. 


Figure 17.3 Making recombinant DNA molecules. 


and leaves sticky ends 


5'-N GATCN- 3’ 
3'-NCTAG N-5' 


(where N represents any nucleotide). Since the sticky ends 
created by the two enzymes are the same (5'-GATC-3’), 
the ends of a BamHI- and a Sau3A-digested fragment 
can combine to create recombinant DNA molecules. 
However, in this case, the resulting ligated products will 
often lack an intact BamHI site, since the 5’ Ns from the 
Sau3A site may not be Gs. 

Usually the goal of this process is to create recom- 
binant DNA molecules in which a single piece of source 
DNA is combined with a single cloning vector molecule. 
However, because digested DNA from both sources is 
mixed together in a test tube, a variety of recombinant 
molecules may arise. For example, some recombinants 
may have a single donor-DNA insert, whereas others may 
have two or more donor fragments that join together and 


then insert into the vector. In addition, the sticky ends of 
vectors can rejoin each other rather than incorporating 
a donor insert, producing a nonrecombinant vector. 
Because neither nonrecombinant vectors nor clones with 
multiple inserts are desired results, techniques to favor 
the production of single-insert clones have been devel- 
oped. For example, the occurrence of nonrecombinant 
vectors can be reduced by removal of the 5’ phosphates 
on the vector DNA, so that the vector DNA cannot ligate 
to itself to produce nonrecombinant clones. 

A feature of experiments using a single restriction 
enzyme or using two enzymes with cohesive compatible 
ends is that the insert DNA can be ligated into the vector 
in either orientation. One way to ensure that insert DNA is 
cloned into a vector in a specific orientation is to use two 
restriction enzymes with different compatible ends, a pro- 
cess called directional cloning (Figure 17.4). Directional 
cloning has three desirable features. First, only insert-DNA 
fragments possessing the two different compatible ends 
will be efficiently inserted into the vector. Second, the 
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Figure 17.4 Directional cloning of DNA molecules. 
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inserted fragments are ligated in a particular orientation 
dictated by the cohesive compatible ends. And third, due 
to the incompatibility of the two ends of the digested vec- 
tor DNA, the vector cannot re-ligate to itself, thus mini- 
mizing the creation of nonrecombinant vectors. 

While hundreds of restriction enzymes are commer- 
cially available, cohesive compatible ends are not always 
possible to produce at the positions necessary for con- 
structing the desired recombinant DNA molecules. One 
approach to creating compatible ends in such a case is to 
generate blunt ends—ends without any overhang—that 
can then be ligated to form a recombinant molecule. 

Some restriction enzymes naturally create blunt ends, 
but any restriction enzyme site can be converted into a 
blunt end. There are two general strategies (Figure 17.5). 
For example, DNA polymerase (see Chapter 7) can use a 
5’ overhang as a template and add dNTPs to the recessed 3’ 
end until a blunt end has been produced. Alternatively, 
3’ overhangs can be made blunt by a DNA exonuclease 
(see Chapter 7) that degrades only single-stranded DNA 
and “chews back” the 3’ overhang. Some procedures use 
shearing force rather than restriction enzymes (as when 
DNA is passed through a fine needle), producing random 
DNA fragments whose ends can then be blunted by treat- 
ment with a DNA polymerase and exonuclease. Conversely, 
blunt ends can be converted into sticky ends by ligation of 
short oligonucleotides onto the blunt-ended DNA mol- 
ecules. The oligonucleotides can be synthesized to have 
sequences for any restriction enzyme desired, thus adding 
any specific restriction site to the end of any DNA molecule. 
Oligonucleotides of this type are called linkers. 


Plasmids as Cloning Vectors Plasmids are circular DNA 
molecules that replicate autonomously in bacteria and 
usually carry nonessential genes. The F-factor involved in 
E. coli conjugation (see Chapter 6) is a plasmid. Plasmids 
used as cloning vectors replicate independently of the 
bacterial chromosome and, unlike the F-factor, which can 
recombine into the E. coli chromosome, always remain 
separate from it. Most plasmids used as cloning vectors 
have been modified in the laboratory to possess several 
features that facilitate the production of recombinant 
DNA molecules (Figure 17.6a). For example, plasmids 
are equipped with an origin of replication (ori) that drives 
efficient replication of the plasmid within the bacterial 
host. They also contain a gene conferring a trait that 
permits bacteria harboring the plasmid to be selectively 
grown. Genes conferring resistance to antibiotics are 
commonly used as selectable markers. 

Two types of plasmids, identified as pUC-based plas- 
mids and pBR-based plasmids, are most frequently used in 
constructing recombinant plasmids capable of transform- 
ing competent bacteria. Both types have many different 
forms, developed through extensive genetic engineering 
in the laboratory. In these vectors, the -lactamase gene, 
conferring resistance to ampicillin, is often used as the 
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Figure 17.5 Connecting blunt ends to create recombinant 
DNA molecules. 


selectable marker. The origin of replication was derived 
from a naturally occurring E. coli plasmid called the ColE1 
plasmid. The ColE1 ori allows these plasmids to be main- 
tained at a high copy number of 100-200 plasmids per cell. 

Both pUC and pBR plasmids also contain a multiple 
cloning site (MCS) that has several different restriction 
enzyme sites into which DNA can be inserted These 
restriction enzyme sites occur only within the MCS 
and nowhere else in the plasmid. In pUC-based plas- 
mid cloning vectors, the MCS is embedded in the lacZ 
gene, which encodes f-galactosidase, an arrangement 
that provides a colorimetric assay for determining which 
bacteria harbor vectors with an insertion of DNA into 
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the MCS (Figure 17.6b). Although the normal substrate 
for B-galactosidase is lactose, the enzyme can also cleave 
lactose analogs, such as X-gal. When the colorless sub- 
strate X-gal is added to the growth medium, bacteria 
with a functional lacZ gene producing B-galactosidase 
will convert X-gal to a blue product. When a fragment of 
DNA is inserted into the MCS, the lacZ gene is disrupted 
and rendered nonfunctional. Bacteria then will appear 
as white colonies, whereas bacteria harboring a cloning 
vector that does not contain a fragment of DNA inserted 
in the MCS are blue. This difference allows rapid iden- 
tification of colonies harboring vectors with inserts in 
the MCS. Thus, selection based on antibiotic resistance 
allows identification of bacteria that have been trans- 
formed, and blue versus white screening allows identi- 
fication of bacteria harboring plasmid vectors with an 
insertion of recombinant DNA. 


Amplifying Recombinant DNA Molecules For 
amplification—that is, replication of the recombinant 
DNA molecules in large numbers—the recombinant 
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molecules are introduced into E. coli by transformation, 
the same process described by Griffiths and by Avery, 
MacLeod, and McCarty in their early investigations 
of the hereditary function of DNA (Figure 17.7; see 
Chapter 6). In modern laboratories, DNA is mixed with 
E. coli in a test tube. The bacteria are chemically treated 
with either divalent cations (such as Ca”") or an electrical 
shock to open pores in their membranes, thus making 
the bacteria “competent” to take up exogenous DNA by 
transformation. For safety purposes, the bacterial strains 
used in recombinant DNA experiments are chosen for 
characteristics that do not allow them to survive well 
outside of the laboratory. 

The concentrations of DNA used to transform 
competent bacteria are those determined empirically 
to be concentrations at which individual bacterial cells 
are likely to take up no more than one DNA molecule. 
After transformation, the bacteria are allowed to re- 
cover for a short period of time and are then plated 
on growth medium that selects for cells containing 
the selectable marker gene, conferring resistance to 
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Figure 17.7 Amplification of recombinant DNA molecules 
in bacteria. 


an antibiotic, encoded on the DNA vector. When the 
transformed bacteria are plated on media containing 
the antibiotic, only those bacteria harboring vector 
DNA will survive. 


Recombinant DNA molecules introduced into mi- 
crobial cells are amplified by repeated cycles of DNA 
replication within the bacteria. Since the recombinant 
vector has an origin of replication, it will amplify by 
autonomous replication using bacterial enzymes. After 
that, the next time the bacterium divides, each of its 
progeny will receive copies of the recombinant DNA 
molecule. Because a single bacterium with a recombinant 
DNA molecule can grow into a colony consisting of some 
10° bacteria, each with multiple copies of the recombi- 
nant DNA molecule, billions of identical copies of DNA 
molecules are made. 

The use of plasmid vectors for cloning large DNA 
fragments is limited, mainly because large plasmids (over 
20 kb) are not efficiently maintained in a high copy num- 
ber. This limitation restricts the usefulness of plasmids in 
cloning eukaryotic genomic DNA. Eukaryotic genomes 
can be large (the human genome is 3 X 10° bp), with in- 
dividual genes that are often much longer than 20 kb and 
therefore cannot be cloned in a single plasmid. To over- 
come these limitations of plasmids, vectors capable of 
handling larger clones have been developed (Table 17.1). 
Two general approaches have been employed to propa- 
gate larger DNA fragments. In one approach, vectors 
based on the life cycle of bacteriophages—in particular, 
bacteriophage lambda—accommodate larger fragments 
of DNA. The second approach harnesses single-copy 
origins of replication to efficiently propagate even larger 
recombinant DNA molecules in both bacteria and yeast. 


Bacteriophage Vectors Bacteriophage lambda is capable 
of both a lytic life cycle and a lysogenic life cycle (see 
Section 14.6). Phage propagation through the lysogenic life 
cycle requires the presence of all the genes of the lambda 
genome, but genes that are specifically involved in the 
lysogenic life cycle are dispensable for the lytic life cycle. If 
the genes required for lysogeny are removed, they can be 
replaced by up to 23 kb of DNA from another source, and 
the recombinant phage can then be propagated through 
a lytic life cycle (Figure 17.8a). In bacteriophage-based 
vectors, it is the replication of the phage within the 
bacterium that amplifies the recombinant DNA molecule. 

The size of inserted DNA that can be accom- 
modated is further increased by taking advantage of 
another feature of the lambda bacteriophage system: 
rolling circle replication (see Chapter 6). During the 
lytic life cycle, replication of lambda DNA by rolling 
circle replication results in successive concatenation of 
50-kb genomes into long DNA molecules. A lambda- 
encoded nuclease then recognizes a specific sequence 
within the lambda genome and cleaves the concatenated 
genomes into single-genome units. Subsequently, spe- 
cific sequences called cohesive end sequence, or cos, 
sites in the lambda genome will interact with lambda 
phage coat proteins to “package” the individual lambda 
genomes into discrete phage particles in vitro. The cos 
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Table 17.1 | Cloning Vectors 


Vector Form of DNA Host 
Plasmid Circular E. coli 
F Lambda j Linear phage chromosome E. coli 
Cosmid Circular E coli 
l BAC i Bacterial chromosome E. coli 
YAC Yeast chromosome 


sites are the only lambda sequences required for pack- 
aging of DNA, so when DNA from another source is 
concatenated with cos sequences derived from lambda, 
the ligated DNA can be packaged into phage particles. 
In this case, neither the genes for lysis nor the genes for 
lysogeny are in the phage particles; thus, after infection 
of a host bacterium, the injected DNA does not enter 
the lambda life cycle. If an origin of replication and a 
selectable marker are included in the vector, however, 
the DNA can be replicated as a plasmid in the bacterium 
(Figure 17.8b). Vectors with these features are known as 
cosmid vectors. Since the lambda phage can hold up to 
50 kb, cosmid vectors can carry up to 45 kb of insert se- 
quence along with 5 kb of cos, origin of replication, and 
selectable marker sequence. 


Artificial Chromosomes While both lambda and cosmid 
vectors have been historically important, vectors called 
artificial chromosomes, which have the capacity to carry 
even larger DNA fragments, are now more frequently 
used. These were developed through accumulated 
knowledge of how chromosomes propagate in bacteria 
and eukaryotes, and of the functions of different 
chromosome regions in replication. 

Yeast artificial chromosomes (YACs) were the first 
artificial chromosomes developed and are used as cloning 
vectors in S. cerevisiae. A YAC vector contains sequences 
corresponding to a centromere (see Section 11.3), telo- 
meres, a selectable marker, and a cloning site, and it can 
accept an insert size of 200 kb to 2 megabases (Mb). YACs 
carrying an insert smaller than 200 kb are often unstable 
and do not properly segregate at mitosis. 

Bacterial artificial chromosomes (BACs) were 
developed shortly after YACs. Although BACs have a 
smaller insert-size capacity (100-200 kb) than YACs, they 
are the preferred artificial chromosome cloning vector, 
largely due to the ease of using E. coli rather than yeast 
as a host. Like plasmids, BAC vectors contain an origin 
of replication, a selectable marker gene, and an MCS. 
However, the origin of replication in BAC vectors is de- 
rived from the F-factor plasmid. Unlike replication via the 
ColE1 origin, replication via the F-factor origin is strictly 
controlled, producing only one or two copies of the 
F-factor per cell. This difference allows large plasmids to 
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<15 kb Subcloning and cDNA libraries 
<23 kb cDNA and genomic libraries 
30-45 kb Genomic libraries 

100-200 kb Genomic libraries | 

200-2000 kb Genomic libraries 


be maintained, circumventing the problem encountered 
with plasmids that have high copy numbers. 

The utility of BAC cloning vectors becomes apparent 
when we consider the typical sizes of eukaryotic genes. 
For example, while individual globin genes in the B-globin 
locus are about 1.4 kb in length, the regulatory sequences 
controlling the cluster of globin genes span about 70 kb of 
genomic DNA. The entire B-globin locus can be contained 
in a single BAC, but would not be contained in a single 
plasmid or cosmid clone. However, some eukaryotic genes, 
such as the gene for Duchenne’s muscular dystrophy in 
humans, span more than a megabase and are unlikely to be 
contained within a single BAC or YAC clone. 


DNA Libraries 


A DNA library is a collection of cloned fragments of 
DNA, usually derived from the nucleic acids of a single 
source (recall our use of library in Chapter 16). DNA 
libraries come in two varieties: those derived from the ge- 
nomic DNA of an organism are called genomic libraries, 
and those derived from mRNA are called complementary 
DNA (cDNA) libraries. Since the source of nucleic acids 
for each type of library differs, the kinds of sequences rep- 
resented in each type also differ. 

In theory, genomic libraries should contain all the 
sequences found in the genome of the source organism. 
For example, a human genomic library would contain 
all 3 X 10? bp in the haploid genome sequence. This 
would include the exons and introns of genes, the regula- 
tory sequences controlling gene expression, the intergenic 
sequences (noncoding sequences between genes), and 
repetitive sequences (centromeres, telomeres, ribosomal 
DNA, transposons, retroelements, etc.). By contrast, CDNA 
libraries are derived from mRNA and thus represent the 
DNA sequences that are transcribed in the tissue from 
which the mRNA is derived. Since only a fraction of the 
genes present in the genome are likely to be expressed 
in any particular tissue, and even those are expressed at 
different levels, only a fraction of the genes are represented, 
and in different amounts, in any cDNA library. Thus, the 
number of times a specific sequence is represented in a 
library differs significantly between genomic and cDNA 
libraries. 
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Figure 17.8 Cloning in bacteriophage vectors. 
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Constructing Genomic Libraries Genomic libraries are 
collections of individual clones derived from the genomic 
DNA of an organism. To construct a genomic library, 
genomic DNA, usually from a single individual, is isolated 
and fragmented into smaller pieces that are then ligated 
into cloning vectors (Figure 17.9). The recombinant 
vectors are transformed into bacteria (in the case of 
plasmid and BAC vectors) or used to infect bacteria (in 
the case of phage vectors) that grow into colonies or 
plaques that collectively contain clones representing the 
entire genome. 

A genomic library contains each sequence in the 
genome at approximately the same frequency. Thus, se- 
quences representing the exons and introns of genes, the 
regulatory sequences controlling their expression, and 
repetitive and intergenic sequences are all approximately 


equally represented in the genomic library. However, in 
practice, some sequences are not efficiently maintained in 
the host cells and will be underrepresented, so the entire 
genome is not fully represented in any typical genomic 
library. For example, repetitive DNA tends to be under- 
represented due to its propensity to undergo intragenic 
recombination that results in deletion of DNA sequences 
within clones. 

Three desirable attributes for a genomic library are 
that (1) the genomic clones are broadly representative of 
DNA of the entire genome, (2) the genomic clones are 
large enough to be useful for sequencing and subcloning, 
and (3) the genomic clones are roughly similar in size. 
Let’s look at how these attributes are achieved. 

To ensure that a genomic library is broadly repre- 
sentative, care must be taken to fragment it into random 
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Figure 17.9 Construction of genomic libraries. 
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pieces of an appropriate and relatively uniform size for 
cloning into a vector. Random fragmentation is accom- 
plished by two different methods. In one technique, the 
DNA is partially digested with an enzyme that cuts very 
frequently (e.g., a restriction enzyme that has a 4-bp rec- 
ognition sequence). Partial digestion refers to the use of 
less restriction enzyme than would be needed to cut the 
DNA at every restriction sequence the enzyme recognizes, 
resulting in cuts at some of the restriction sequences but 
not all of them. Since a 4-bp recognition sequence should 
occur every 256 bp on average, partial digestion of DNA in 
which, on average, only one in 400 recognition sequences 
are cut, should result in DNA fragments of approximately 
100 kb. Thus, partial digestion with an enzyme that other- 
wise cuts frequently will generate random, large genomic 
DNA fragments with sticky ends, as desired. The second 
technique for obtaining random fragmentation of DNA 
is random shearing of genomic DNA with subsequent 
enzymatic treatment to create blunt ends. In theory, ei- 
ther technique should provide random representation of 
genomic DNA from the entire genome. 

The size of DNA clones in genomic libraries results 
from technical choices that seek a balance between, on 
the one hand, the difficulty of isolating, cloning, and prop- 
agating large molecules of DNA and, on the other hand, 
the greater number of smaller fragments that would have 
to be cloned in order to span the entire genome. As we 
discuss in Chapter 18, however, a set of genomic libraries 
that each have a different-sized insertion can be useful for 
determining the sequence of an entire genome. 


Constructing cDNA Libraries The starting material for 
a cDNA library is mRNA, often derived from a specific 
tissue or cell type. Messenger RNA cannot be cloned 
directly because it is single stranded and is of course 
RNA, not DNA. Cloning of mRNA sequences can be 
accomplished by synthesizing a double-stranded cDNA 
copy of the mRNA and then ligating the cDNA into a 
vector. cDNA libraries are especially useful for working 
with eukaryotic organisms whose gene sequences are 
interrupted by many long introns. 

The concept and development of cDNA libraries 
required advances in understanding the life cycle of 
retroviruses and the movement of retrotransposons 
(see Section 13.7). The availability of the enzyme reverse 
transcriptase, found in RNA-containing retroviruses, 
and of retrotransposons, which use single-stranded 
RNA as a template to produce a complementary 
strand of DNA, makes cloning from mRNA possible. 
Reverse transcriptase creates cDNA by first transcrib- 
ing a single-stranded DNA molecule complementary to 
mRNA acting as a template. The poly-A tail added to 
RNA polymerase II transcripts of eukaryotes facilitates 
the construction of cDNA libraries from such mRNA, 
since the first strand of cDNA can be synthesized using 
an oligo dT primer (Figure 17.10). The mRNA template 
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Figure 17.10 Construction of cDNA libraries. 


is then enzymatically removed, and the second strand of 
DNA is synthesized by DNA polymerase, using the first 
cDNA strand as a template. 
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The composition of a cDNA library reflects the level 
of expression of different genes active in the tissue from 
which the mRNA was extracted. Genes that are highly 
expressed are represented in the mRNA at a higher fre- 
quency than genes expressed at a lower level, and genes 
not expressed in the tissue of origin are not represented. 
In contrast to genomic libraries, which represent all genes 
at approximately equal frequency, the frequency with 
which any particular gene will be represented in a cDNA 
library is difficult to estimate, since it depends on the 
expression level of the gene in the original mRNA popula- 
tion (Figure 17.11). 

Since cDNA libraries are usually made from mature 
cytoplasmic mRNA, the only sequences included in the 
cDNA clones are the 5’ untranslated region (5’-UTR), the 
exons, and the 3’-UTR (see Section 9.1 for discussion of 
UTRs); the clones will lack any intronic and intergenic se- 
quences. Since the genetic code is universal, cDNA clones 
derived from one organism can be expressed in any other 
organism as long as appropriate transcriptional (e.g., pro- 
moter) and translational signals are inserted to promote 
efficient gene expression in the host organism. A cDNA 
library constructed with such features is called an expres- 
sion library. An example of a use for an expression library 
is described in Section 16.2. 


The Uses of Libraries DNA libraries have many uses, 
especially as a resource from which genomic or cDNA 
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Figure 17.11 Content of genomic versus cDNA libraries. 


clones of specific genes can be identified and then 
employed in subsequent experiments. For example, clones 
from a library can be manipulated to create reporter genes 
or to produce novel alleles (e.g., chimeric genes) that 
can then be used in the creation of transgenic organisms 
(see Section 16.4). In addition, library construction is 
the starting point for most protocols performing next- 
generation sequencing of the genomes or mRNA content 
of organisms, which we will explore in greater detail in 
Chapter 18. 

Once a library or other collection of clones is pro- 
duced, biologists use techniques described in previous 
chapters—PCR and the analysis of nucleic acids by hy- 
bridization—to identify clones containing specific DNA 
sequences (see Research Technique 10.2 on page 354). 
All techniques to identify fragments containing specific 
nucleic acid sequences take advantage of the exquisite 
specificity of complementary base pairing between single- 
stranded molecular probes and single-stranded target- 
sequence regions of DNA or RNA. 

Recall that in hybridization-based techniques, DNA 
fragments are fixed onto a membrane that is then exposed 
to a labeled probe. The probe hybridizes to any fragments 
containing a complementary sequence. When the excess 
probe is washed from the membrane, the DNA fragments 
that have hybridized with the probe can be detected. 

The same concept applies when a labeled probe 
is applied to cloned DNA fragments from a library 
(Figure 17.12). A membrane is laid on top of the bac- 
terial colonies growing on a petri dish. Each colony 
contains clones of a different fragment from the li- 
brary, and some of the bacteria in each colony stick to 
the membrane. The bacteria remaining on the petri 
dish serve as a resource for a later step in the proce- 
dure. The membrane-bound bacteria are lysed, and their 
DNA is denatured. The membrane can then be probed 
with a labeled single-stranded nucleic acid and treated 
as described in Research Technique 10.2 for Southern 
blots. DNA that hybridizes with the probe is detected 
(e.g., by autoradiography), and the colonies it came from 
are identified by their position on the original petri dish. 
The same protocol is followed for phage, which form 
plaques in lawns of bacteria spread on petri dishes, and 
for yeast, which forms colonies similar to those of bacte- 
ria. Alternatively, PCR-based techniques can be used to 
identify and amplify clones within a library that contain 
specific sequences, namely, those of the primers that are 
used in the PCR reaction. 


Sequencing Long DNA Molecules 


The ultimate description of any DNA molecule is its pre- 
cise sequence of bases. The process of Sanger sequencing, 
also known as dideoxy sequencing, was developed for this 
purpose in the 1970s (see Chapter 7). In dideoxy sequenc- 
ing, approximately 800 to 1000 consecutive bases are 
sequenced in each reaction—called a sequencing “read.” 
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Figure 17.12 Screening libraries for specific sequences. 


But most DNA regions of interest are larger than this. 
How are larger fragments of DNA sequenced? 

There are two basic strategies for sequencing large 
DNA molecules. The first technique, primer walk- 
ing (Figure 17.13a), relies on the successive synthesis 
of primers based on the progressive attainment of new 
sequence information. The DNA sequence information 
obtained in the first dideoxy sequencing reaction provides 
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(a) Sequencing by primer walking 
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Figure 17.13 Primer walking versus shotgun sequencing 
approaches. 


a foundation for the design of a second primer. If the 
second primer is 600 to 800 bases from the first primer, 
the second dideoxy sequencing reaction can extend the 
known sequence up to 1800 bases from the first primer. 
Reiterations of this process allow technicians to “walk” 
along a long DNA molecule, designing new primers every 
600 to 800 bases. The speed with which a molecule is se- 
quenced by this method is limited by its reiterative nature. 
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A second method for sequencing large molecules 
of DNA is shotgun sequencing, an approach that relies 
on redundant sequencing of fragmented target DNA in 
the hope that all regions will be sequenced at least a few 
times. In this technique, a large DNA molecule (e.g., a 
BAC clone of 100 kb) is fragmented into smaller pieces, 
and the fragments are ligated into cloning vectors (Figure 
17.13b). The fragments may be generated by partial re- 
striction enzyme digestion or by shearing the DNA. The 
key here is that fragmentation is done in such a way as to 
produce random and hence overlapping pieces. The ends 
of these clones can then be sequenced using a primer 
based on the vector sequence. The clones of these frag- 
ments can be considered a library of sequences from 
the larger DNA molecule. The strategy is to sequence 
enough clones to be able to assemble a complete contigu- 
ous sequence on the basis of overlaps in the sequences. 
Computer algorithms are available to perform much of 
this task, allowing data from millions of sequencing reac- 
tions to be assembled quickly (see Section 18.1). Thus, 
in shotgun sequencing, the sequencing of the many dif- 
ferent fragments proceeds simultaneously (“in parallel”), 
allowing long DNA molecules to be sequenced rapidly. 


17.2 Introducing Foreign Genes 
into Genomes Creates Transgenic 
Organisms 


The introduction of a gene from one organism into 
the genome of another organism creates a transgenic 
organism. The introduced gene is known as a transgene; 
if the introduced gene comes from a different species, 
it is a heterologous transgene. The two principal chal- 
lenges to creating a transgenic organism are (1) the need 
to introduce DNA into a cell in such a way that the DNA 
integrates into the genome and (2) the need to provide 
appropriate regulatory sequences so that the transgene 
will be properly expressed. 

Because cells of different organisms differ in the 
ability to import DNA from their environment and in 
their propensity to recombine exogenous DNA into 
their genomes, protocols for introducing transgenes vary 
according to the organism. Nevertheless, the production 
of transgenic organisms is surprisingly straightforward, 
perhaps because naturally occurring mechanisms have 
evolved in most lineages of life for the uptake or delivery 
of DNA. Many organisms or cells will absorb DNA from 
their environment, and once inside the cell, one potential 
fate of the DNA is to recombine into the genome. Recall 
our discussion of certain naturally occurring versions of 
this process, including gene transfer by Hfr donors into 
recipient bacteria, transduction of genes from a bacte- 
rial donor to a recipient, and gene transfer between and 
within species by transformation (see Chapter 6). 


Although the designing of transgenes utilizes tech- 
niques of recombinant DNA technology, the expression 
of transgenes is like the expression of any gene: The gene 
sequence must first be transcribed into mRNA and then 
translated into a polypeptide. The universality of the ge- 
netic code permits the expression of coding sequences even 
when transferred between the most distantly related organ- 
isms—even when one of them is bacterial or archaeal and 
the other a eukaryote. However, regulatory sequences and 
their molecular interactions with transcriptional and trans- 
lational machinery vary significantly among organisms, and 
they are not interchangeable between distantly related or- 
ganisms. Thus, for transgenes to be efficiently expressed, 
they must be combined with host regulatory sequences. 


Expression of Heterologous Genes in Bacterial 
and Fungal Hosts 


Bacterial transformation by a recombinant plasmid is the 
primary method for generating transgenic bacteria. As 
seen in Section 17.1, foreign DNA can be introduced into 
bacteria, such as E. coli, using a plasmid vector possessing 
sequences required for DNA replication and also possess- 
ing a selectable marker, such as antibiotic resistance, to 
facilitate the identification of transformants. 

Expression vectors are vectors that have been fur- 
nished with sequences capable of directing efficient tran- 
scription and translation of transgenes (Figure 17.14). For 
transgenes to be properly expressed in E. coli, regulatory 
sequences compatible with the transcription and transla- 
tion machinery in E. coli need to be present in the vec- 
tor. Expression vectors for use in E. coli are constructed 
from plasmids that have been equipped with promoter 
sequences that bind RNA polymerase upstream of the 
multi-cloning site (MCS) of the plasmid. Recall that the 
MCS is a cluster of unique restriction sites into which 
the gene to be expressed is inserted in recombinant 
clones. Efficient translation of mRNA in E. coli also re- 
quires the presence of a Shine-Dalgarno sequence in the 
5' untranslated region of the mRNA, another feature that 
is built into E. coli expression vectors. In addition, since 
mRNA-splicing machinery does not exist in bacteria, 
eukaryotic transgenes must be free of introns if they are 
to be properly translated in bacteria. This requirement 
necessitates the use of cDNAs as eukaryotic transgenes in 
E. coli expression systems. 

Expression of the heterologous gene carried by an 
expression vector can be either constitutive (“on” all the 
time) or regulated by the addition or removal of inducer 
compounds. An example of the latter approach is the use 
of the regulatory apparatus of the lac operon of E. coli to 
induce expression of transgenes: Fusion of the lac operator 
and CAP binding sites of the lac operon to the RNA poly- 
merase binding site allows the transgene to be controlled in 
the same inducible manner as the genes of the lac operon 
(the lac operon is described in Chapter 14). 
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Figure 17.14 Expression vectors for E. coli and eukaryotes. 


Two kinds of variation in the genetic mechanisms of 
living organisms can hamper the efficient production of 
functional transgenic products. The first complication 
affects the efficiency of translation. While the universal 
genetic code does indeed allow the expression of heter- 
ologous transgenes, organisms vary in the degree to which 
they use specific codons when the genetic code contains 
more than one for a given amino acid or signal. In most 
species, synonymous codons are not used with equal fre- 
quency. For example, glycine is encoded by GGN, with N 
representing any nucleotide, but GGA and GGG are rarely 
used in E. coli, whereas these codons are commonly used 
in the other organisms listed in Table 17.2. The tRNAs 
corresponding to frequently used codons are expressed at 
higher levels than are the tRNAs for rarely used codons. 
This preferential use of codons is called codon bias. Thus, 
for efficient production of heterologous proteins in E. coli, 
the codon usage within the heterologous gene sequences 
may have to be altered to approximate the codon bias in 
E. coli. Note that such changes do not alter the amino 


Table 17.2 Preference in Different Organisms 


for Specific Glycine Codons 


Codon E.coli S.cerevisiae H.sapiens A. thaliana 

GGA 0 23% 23% 37% 
Rec 2% 12% 26% 15% 
GGC 38% 20% 33% 14% 
GGT 59% 45% 18% 34% 


(b) Eukaryotic expression vector 
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acid sequence of the encoded protein; they only alter the 
efficiency with which translation occurs in E. coli. Codon 
bias can affect the expression of heterologous transgenes 
in any case where genes are being transferred between 
distantly related species. 

A second possible obstruction to the production of 
functional heterologous proteins in E. coli is presented by 
the post-translational modifications many proteins must 
undergo in order to function. Post-translational modifi- 
cations of proteins differ between species, in particular 
between eukaryotes and bacteria. For example, carbohy- 
drate and lipid groups are added to many kinds of eukary- 
otic proteins. In addition, the functions of proteins may 
be modified by phosphorylation, acetylation, or methyla- 
tion of amino acid residues; other post-translational poly- 
peptide processing; and specific protein-folding activities. 
Most of these processes either do not occur in bacterial 
cells or they occur but with significant differences. In 
such cases, eukaryotic cells, such as yeast or cells in tis- 
sue culture, and eukaryotic expression vectors must be 
used. Eukaryotic expression vectors have the eukaryotic 
features analogous to the features found in bacterial ex- 
pression vectors, including sequences for the regulation 
of transcription (such as a TATA box for binding of RNA 
polymerase II), enhancer sequences for qualitative and 
quantitative control of transcription, and polyadenylation 
and transcription-termination signals (see Figure 17.14). 


Production of Human Insulin in E. coli A gene encoding 
insulin was among the first human genes to be expressed 
in E. coli, and human insulin was the first protein 
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manufactured from recombinant DNA technology for 
therapeutic use in humans. Insulin, a protein hormone, 
regulates sugar metabolism in animals by stimulating liver 
and muscle cells to take in glucose, and fat cells to take 
in lipids, from the blood. Individuals who are unable to 
produce insulin, or whose cells cannot respond to it, have 
diabetes, an often debilitating disease that affects millions 
of people worldwide. 

Insulin is cyclically produced in the pancreas by 
specialized cells in the islets of Langerhans and is re- 
leased into circulating blood in response to the ingestion 
of sugar-containing carbohydrates. The pancreatic cells 
initially synthesize a 110—amino acid precursor protein 
called preproinsulin that is not secreted and does not have 
hormonal function until it is proteolytically processed. 
Twenty-four N-terminal amino acids—the “pre” amino 
acids of preproinsulin—are cleaved from the precursor to 
produce proinsulin, an event followed by the cleavage of 
an additional 35 amino acids—called the “pro” segment— 
from the middle of the protein. Further cleavage gener- 
ates two amino acid chains, called the A chain and the 
B chain, that are 21 and 30 amino acids, respectively, in 
length. The A chain is joined to the B chain by disulfide 
bonds between cysteine residues to produce insulin. 

The amino acid sequence of insulin was determined 
by Fred Sanger in the early 1950s (Figure 17.15,@), but 
the human gene encoding insulin was not identified un- 
til the late 1970s. Even before the human insulin gene 
was cloned, however, molecular biologists began experi- 
ments designed to produce human insulin in E. coli by 
constructing recombinant plasmids containing chemi- 
cally synthesized DNA encoding human insulin. An ex- 
perimental strategy called the two-chain method utilized 
two synthetic genes, one encoding the A chain and the 
other encoding the B chain. Each synthetic gene was con- 
structed from oligonucleotides whose sequence was based 
on the reverse translation of the amino acid sequences of 
the human insulin gene chains @. 

The synthetic genes were cloned into separate plas- 
mid vectors. In each case the chain was fused, in the 
same reading frame, to the 3’ terminus of the lacZ 
gene encoding f-galactosidase. Genetic constructs like 
this, consisting of two or more genes or gene segments 
joined together to form a new, artificial gene, are called 
fusion genes. Transcription and translation of a fusion 
gene produce a fusion protein, which in each of these 
cases contained the polypeptide of one insulin chain 
fused to the carboxyl terminus of $-galactosidase (the 
protein product of the lacZ gene). To separate the insu- 
lin peptides from B-galactosidase peptides and to form 
functional insulin molecules, a methionine residue was 
engineered into the fusion protein at the junction be- 
tween the N-terminal end of the insulin peptides and the 
C-terminal end of the B-galactosidase peptides to serve as 
a peptide cleavage site OQ. 

In the recombinant plasmid, transcription is under 
control of the lac operator regulatory sequences. Gene 


transcription is induced by lactose in the absence of 
glucose @ and @ (see also Chapter 14). Under appropri- 
ate growth conditions, up to 20% of the total protein 
produced by the recombinant E. coli strains is the fusion 
protein. Treatment of proteins with cyanogen bromide 
(CNBr) cleaves peptide bonds at the carboxyl end of 
methionine residues @. Apart from the methionine that 
was inserted at the junction of the two peptides, there 
are no other methionine residues in the fusion pro- 
tein, so CNBr treatment releases the insulin chains from 
the B-galactosidase peptides without causing any other 
breaks. When the A and B chains are purified from their 
recombinant host strains and mixed together under oxi- 
dizing conditions, disulfide bonds form to link the A and 
B chains and produce active insulin molecules ©. 

The recombinant human insulin molecules originally 
produced by this method were identical to naturally oc- 
curring human insulin. Since the implementation of this 
synthetic process in the 1980s, however, more-efficient 
methods for producing recombinant human insulin have 
been developed. Some of these methods have introduced 
amino acid changes in the recombinant human insulin, 
in order to create proteins that have different desired 
effects on the uptake of glucose by targeted cells. These 
various forms of recombinant human insulin are used by 
millions of insulin-dependent diabetics around the world 
every day. 

The ease and economy of working with bacteria as 
compared to eukaryotes have made it practical to produce 
many eukaryotic proteins in bacteria for both medical and 
industrial applications. In addition to human insulin, pro- 
teins such as human growth hormone (HGH) and eryth- 
ropoetin (which induces red blood cell formation) are 
produced in bacterial systems. The recombinant systems 
used to produce these and many other pharmaceutical 
and industrial agents are safe and effective sources of oth- 
erwise scarce material. For example, before the produc- 
tion of human insulin by recombinant DNA technology, 
insulin was extracted from pig and cow pancreases col- 
lected as a by-product of the meat industry. Pig and cow 
insulin are very similar to human insulin, but not identical 
to it; as a result, allergic reactions compromised their use 
by diabetics. Insulin extractions from animals also carry a 
risk of contamination from the source tissues. Likewise, 
HGH extracted from the pituitary glands of human ca- 
davers carries a risk of transmitting neurological disease 
(e.g., Creutzfeldt-Jacob disease) due to the possible pres- 
ence of contaminating proteins. Both recombinant hu- 
man insulin and recombinant HGH have proven safe and 
effective over decades of use. 

Many proteins used in industrial processes as well as 
in everyday household products are produced in bacteria. 
For example, proteases are protein-degrading enzymes 
added to laundry detergents to aid in removing stains 
from clothing. Isolation of genes encoding proteases from 
psychrophilic, or cold-loving, bacteria has allowed the 
industrial production of proteases that act in cold water, 
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@ Amino acid sequence of human insulin B chain was determined by peptide sequencing. 
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(2) A nucleotide sequence was created by reverse translation of the amino acid sequence. Two successive stop codons were 
added following the open reading frame. 


Coding 5’ TTCGTCAATCAGCACCTIIGIGGIICICACCICGITGAAGCITIGIACCIIGITIGCGGTGAACGIGGITICTTCTACACTCCTAAGACTIY NENA 3 
Template 3’ AAGCAGTTAGTCGTGGAAACACCAAGAGT GGAGCAACT TCGAAACATGGAACAAACGCCACTTGCACCAAAGAAGATGTGAGGATICTGALNEUNG) 5’ 


| 


© A methionine codon was inserted at the beginning of the insulin B coding sequence to facilitate subsequent isolation of 
the insulin B protein. 


5 PMISTTCGTCAATCAGCACCTTTGTGGTTICTCACCTCGTTGAAGCTTTGIACCTIGI1 1 GCGGTGAACGTGGTIICTICTACACTCCTAAGACTIBNMBAGH ?' 
3’ (BNGAAGCAGTTAGTCGTGGAAACACCAAGAGTGGAGCAACTICGAAACATGGAACAAACGCCACTTGCACCAAAGAAGATGTGAGGAT TCT GALEN] 5’ 


| 


(4) EcoRI and BamHI sites were added to the ends of the DNA to facilitate cloning into a vector. 


5’ ANAA TCGTCAATCAGCACCTTIGTGGTTCTCACCTCGTTGAAGCTITGTACCT1GI11GCGGTGAACGTGGTT1CT1TCTACACTCCTAAGACTIBNAINANCleL Umea 3” 
3 MO AAGCAGTTAGTCGTGGAAACACCAAGAGTGGAGCAACTTCGAAACATGGAACAAACGCCACTTGCACCAAAGAAGATGTGAGGATTCT GANIE NAAT 5’ 


O The entire DNA fragment was chemically synthesized. | 
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6) The insulin B chain (blue) was cloned 
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@ The protein produced in E. coli was purified and | 
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6 The insulin A chain was produced using a similar 
strategy. Active insulin was produced after mixing 
the two purified chains together in an oxidizing 
atmosphere to induce disulfide bonds between 
the cysteine residues of the two chains. 


Figure 17.15 Producing human insulin in E. coli. This strategy was used in the late 1970s by the 
City of Hope National Medical Center and the biotechnology company Genentech to produce human 
insulin in E. coli. 


leading to substantial savings in energy costs stemming 
from household hot water usage. The genetic engineer- 
ing of E. coli and other microbes to produce proteins or 
compounds used in industry, agriculture, and health care 
is an active field that will flourish in the coming years as 


more microbial systems are investigated at the genomic 
and physiological levels. An example of the transfer of 
an entire biochemical pathway into E. coli in order to 
produce a medically important compound is described in 
Experimental Insight 17.2. 
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Experimental Insight 17.2 
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Plant-Derived Antimalarial Drugs Produced in E. coli 


The production of amorphadiene in E. coli exemplifies the use 
of genetic engineering to produce a high-value pharmaceuti- 
cal product. Amorphadiene is the immediate precursor to 
artemisinin, a potent antimalarial drug. Artemisinin has been 
touted as the next-generation antimalarial drug because it is 
effective at treating multiple stages of malarial infection and 
exhibits no cross-resistance with existing antimalarial drugs, 
such as chloroquine and quinine. Chloroquine and quinine 
have been used to fight malarial infection for several decades, 
but their effectiveness is decreasing due to the evolution of 
resistant strains of Plasmodium, the malaria parasite. 


OBSTACLES TO ARTEMISININ PRODUCTION 

Like many modern drugs, artemisinin was originally discov- 
ered in plant extracts. Currently the drug is extracted and pu- 
rified from the sweet wormwood plant, Artemisia annua. The 
logistics of growing Artemisia are limiting factors, however, 
and the cost of producing large amounts of artemisinin from 
its natural source is also prohibitive. Production of artemisinin 
in a fermentable biological system such as F. coli could in- 
crease drug supply, conserve natural resources, and dramati- 
cally lower production costs. 

Artemisinin is a complex terpene molecule produced in 
several biosynthetic steps. All plants produce the precursors 
of the terpene pathway, isopentenyl pyrophosphate (IPP) 
and dimethylallyl pyrophosphate (DMAPP), but the specific 
terpenes produced from them by each plant species vary. 
The final two steps in artemisinin biosynthesis, from farnesyl 
pyrophosphate (FPP) to artemisinin, are catalyzed by enzymes 
encoded by genes specific to Artemisia. While E. coli naturally 


(1) The endogenous E. coli FPP biosynthetic 
pathway, subject to feedback regulation, was 
inactivated by a mutation in the ispC gene. 
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produces IPP and DMAPP, the pathway is subject to feedback 
inhibition, preventing large quantities of these molecules 
from accumulating. 


SUCCESS THROUGH GENETIC ENGINEERING 

This obstacle to producing large quantities of amorphadiene 
in E. coli is circumvented by use of a combination of eight 
genes from Saccharomyces cerevisiae and E. coli to recreate the 
biosynthetic pathway leading to FPP production. @ A mutant 
E. coli strain is used in which the normal feedback inhibition 
of the FPP biosynthetic pathway is lacking. @® Expression of 
the eight S. cerevisiae genes is coordinated by distribution 
of the genes into two operons—one containing three genes 
and one containing five—controlled by lac operon regula- 
tory sequences (see Chapter 14 for a review of the lac op- 
eron system). In this way, gene expression is induced in the 
presence of either lactose or the synthetic inducer isopropyl- 
B-D-thiogalactopyranoside (IPTG). €) The amorphadiene syn- 
thetase (ADS) gene is cloned from Artemisia and placed under 
the control of lac operon regulatory sequences. 

In initial experiments with this system, the levels of ADS 
protein produced in E. coli were disappointingly low. The rea- 
son was discovered to be differences in codon bias between 
Artemisia and E. coli. When codons preferred by Artemisia 
were replaced with synonymous codons preferred by E. coli, 
the production of ADS protein in E. coli became much more 
efficient. @ Now the bacteria produced a large quantity of 
amorphadiene, which could be converted into artemisinin 
either by chemical synthesis or in vivo by the introduction of 
the artemisinin synthetase gene from Artemisia. 


(4) Fermentation of the resulting 
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amorphadiene, which is 
secreted into the media 
and can be converted to 
artemisinin via an in vitro 
chemical process. 
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(2) An FPP biosynthetic pathway composed of a mixture of S. cerevisiae genes (orange) and 
E. coli genes (gray) was introduced on two operons controlled by the Pa: operon 
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(3) The A. annu gene (green) 
encoding ADS, which converts 
FPP to amorphadiene, was 
placed in another expression 
vector, also controlled by the 
lac operon regulatory 
sequences (dark blue). The 
gene was modified to match 
the E. coli codon bias. 
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Transgenes can be introduced into fungal cells in 
a manner similar to the techniques described for bac- 
teria, using a plasmid system developed for the fun- 
gus Saccharomyces cerevisiae (baker’s yeast). In addition, 
DNA can be readily integrated into the genome of many 
fungi by homologous recombination, making direct ma- 
nipulation of the fungal genome feasible. 


Yeast Plasmids Some strains of S. cerevisiae harbor a 
circular 6.3-kb plasmid that, because of its approximately 
2-um diameter, is known as the 2-micron plasmid. This 
plasmid can be modified into a recombinant plasmid by 
the insertion of transgenes. An E. coli origin of replication 
and appropriate selectable markers are also introduced 
into the 2-micron plasmid, which already contains the 
S. cerevisiae origin of replication (Figure 17.16). With these 
additions, the plasmid becomes a Shuttle vector, a vector 
that can replicate in two species—in this case, both E. coli 
and S. cerevisiae—and thus can be used to shuttle DNA 
sequences between them. With this shuttle vector, DNA 
sequences can be manipulated in E. coli, where manipulation 
is easier, after which the modified plasmids can be shuttled 
into yeast for heterologous protein expression. 


Integrating DNA into the Genome of S. cerevisiae If 
DNA that is introduced into an organism has no origin 
of replication, it undergoes one of two fates: enzymatic 
degradation or integration into the host genome. 
Enzymatic degradation, accomplished by nucleases that 
are common in cells, will eliminate the introduced DNA. 
Integration of DNA into the host genome, in contrast, 
allows the introduced nucleic acid to persist in the host 
cell. Integration is accomplished by either of two distinct 
mechanisms of recombination: illegitimate recombination 
or homologous recombination. 

Illegitimate recombination integrates introduced 
DNA at a random, nonhomologous location. This form 
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Figure 17.16 Shuttle vector for E. coli and Saccharomyces 
cerevisiae. 


of recombination does not require any homology between 
the introduced DNA and the genomic DNA into which 
the former is integrated. In contrast, the second mecha- 
nism for integration of introduced DNA, homologous 
recombination between the introduced DNA and the 
host genomic sequence, requires a significant length of 
DNA sequence in common between the two recombin- 
ing molecules. The relative frequencies with which these 
mechanisms occur depend on the species into which the 
DNA is introduced. In most plant and animal species, 
illegitimate recombination is the most common fate, al- 
though techniques exist to select for individuals in which 
homologous recombination has occurred (as described 
later in this chapter). In bacterial and fungal species, in- 
troduced DNA is often recombined in the genome in a 
homologous manner. 

Segments of DNA introduced into S. cerevisiae 
have a propensity to undergo homologous recombi- 
nation. An introduced circular molecule of DNA can 
recombine by either a single crossover or a double 
crossover (Figure 17.17a). In a single crossover, the en- 
tire molecule of introduced circular DNA is integrated 
into the yeast genome with no loss of any genomic 
DNA. If recombination of a circular molecule occurs 
by double crossover, however, only DNA between the 
homologous flanking sequences is integrated into the 
recipient genome, and the integration is accompanied 
by a concomitant loss from the genome of the DNA 
between the homologous sequences. Thus, recombina- 
tion with two crossovers results in replacement of the 
genomic DNA with the introduced DNA flanked by the 
homologous sequences. 

Introducing a linear rather than circular molecule of 
DNA favors retrieval of recombinants produced by dou- 
ble crossover, since a single crossover will cause a deletion 
event resulting in recombinant molecules lacking a large 
portion of the original chromosome and therefore likely 
to be lethal (Figure 17.17b). Linearized DNA molecules 
recombine at a higher frequency than circular ones, mak- 
ing the introduction of linear molecules the method of 
choice for homologous recombination experiments. 

Taking advantage of this tendency for homologous 
recombination to occur in yeast, yeast geneticists create 
recombinant yeast both through gene insertion and gene 
replacement. Loss-of-function alleles are created by re- 
placing the target gene with heterologous DNA, often a 
selectable marker gene, thus eliminating the production 
of functional wild-type protein by the target gene. Gene 
insertions that result in a deletion of the entire coding 
region of the gene create null alleles that produce no pro- 
tein product. Such insertion alleles are often called gene 
knockouts because the insertion “knocks out” the func- 
tion of the gene, creating a recessive loss-of-function allele 
(recall the knockout libraries referred to in Section 16.3). 
Conversely, inserting a functional gene, often creating a 
gain-of-function allele, is called a knock-in. 
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Figure 17.17 Homologous recombination in yeast: Single versus double crossovers. 


The ease with which homologous recombinants are 
generated in S. cerevisiae has allowed the production of a 
large number of yeast strains for genetic analysis of bio- 
logical processes in this organism. Loss-of-function alleles 
of every gene in the S. cerevisiae genome have been gener- 
ated and can be ordered from a stock center. Such stocks 
have greatly facilitated genetic research by relieving sci- 
entists of the need to produce mutations in the genes of 
interest at the start of every new genetic experiment. 


Transformation of Plant Genomes 
by Agrobacterium 


Our food is mainly derived from plants, and humans have 
been genetically modifying plants since the beginning of 
agriculture, nearly 10,000 years ago. For most of this his- 
tory, genetic improvement was limited to interbreeding 
wild and domesticated species to select for traits already 
present in nature. The recently developed techniques for 
introducing DNA from many sources into plants have 


added a new dimension to the genetic modification of 
plants for agricultural purposes. By these new means, the 
genetic variation available in plants has been extended to 
include not only genes from other plant species but also 
genes derived from animals, fungi, and bacteria. 

The most widely used method of generating trans- 
genic plants takes advantage of a natural plant trans- 
formation system that has evolved in the soil bacterium 
Agrobacterium tumefaciens. In nature, this bacterium is 
the cause of crown gall disease, an uncontrolled cell 
division in plant cells. This disease results in tumors 
(galls), typically at the crown (the base near the soil) of 
the plant. Wild strains of A. tumefaciens harbor a large 
plasmid (200 kb) called the tumor-inducing plasmid, or 
Ti plasmid (Figure 17.18a). A portion of the Ti plasmid, 
a region referred to as the transfer DNA (T-DNA) is 
transferred from the bacterium into the nucleus of a plant 
cell. Mary-Dell Chilton and colleagues conclusively dem- 
onstrated the nature of this remarkable cross-kingdom 
transfer of DNA in the late 1970s by demonstrating that 
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Figure 17.18 Crown gall disease caused by Agrobacterium 
via plant transformation. 


Agrobacterium Ti plasmid DNA can be detected inside 
plant cells. Once inside the plant cell, the T-DNA can 
recombine illegitimately with the plant nuclear genome, 
resulting in an insertion of the T-DNA at a random loca- 
tion in the plant genome (Figure 17.18b). 

From the bacterial perspective, the outcome of this 
natural transformation event is the expression of genes 
in the T-DNA that encode proteins causing plant cells 
to (1) divide in an uncontrolled manner and (2) produce 
amino acids only the bacterium can utilize as an energy 
source. Agrobacterium essentially reprograms the plant 
cells into food factories for the bacteria. Bacterial genes 
encoding plant-hormone-biosynthesizing enzymes cause 
transformed plant cells to produce high levels of two plant 
hormones, auxin and cytokinin, which in turn cause un- 
controlled division of plant cells, resulting in tumor forma- 
tion (Figure 17.18c). The other genes on the T-DNA encode 
opine-biosynthesizing enzymes. Opines, such as nopaline 
and octopine, are amino acids that do not naturally occur 
in plants; therefore, plants do not produce any enzymes 
capable of metabolizing opines. Agrobacterium does have 
such enzymes, however; consequently, the opines pro- 
duced by the plant cells can be used as carbon and nitrogen 
sources by the bacteria. Other genes on the Ti plasmid, but 
not located within the T-DNA region, encode enzymes re- 
quired for the transfer of the T-DNA to the plant cell. 

Sequence analysis has revealed that the genes in- 
volved in the transfer of T-DNA are evolutionarily re- 
lated to those involved in the transfer of the F-factor in 
E. coli (see Chapter 6). Thus, Agrobacterium has evolved 
a mechanism to transfer DNA into plant cells by adapting 
genes originally involved in bacterial conjugation. A strik- 
ing aspect of this cross-kingdom gene transfer is that the 
genes on the T-DNA have evolved to be transcribed and 
translated efficiently in plant cells instead of in bacte- 
rial cells. In nature, Agrobacterium normally transforms 
plants only; but in the laboratory, the bacterium has the 
ability to transfer DNA into almost any eukaryotic cell, 
including human cells. 


Creating Transgenic Plants Scientists can use Agro- 
bacterium to transfer any gene of interest into plants. 
To do so, they remove the opine- and tumor-producing 
genes normally found in the T-DNA and replace them 
with DNA encoding the gene of interest. The T-DNA then 
transfers the gene of interest into the plant cell, where it 
becomes integrated into the genomic DNA of the plant. 
Figure 17.19a depicts the manner in which the Ti 
plasmid is modified for transformation procedures. First, 
the tumor-inducing and opine genes are deleted from 
the Ti plasmid, producing what is called a “disarmed” Ti 
plasmid. Then the gene of interest is inserted between the 
two ends of the T-DNA region, referred to as the left and 
right borders. These border regions contain sequences 
required for efficient transfer. Proteins encoded by genes 
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of the Ti plasmid outside of the T-DNA recognize specific 
sequences in the left and right border and catalyze the 
transfer of a single strand of T-DNA from the bacterium 
to the plant cell; when this occurs, the gene of interest 
that has been inserted between the two border sequences 
will be transferred as well. As with any other protocol for 
constructing transgenic organisms, a selectable marker is 
included (between the left and right borders) in addition 
to the gene of interest to allow efficient selection of trans- 
formed plants. For experiments with plants, genes confer- 
ring resistance to either antibiotics (inhibiting translation 
in the chloroplast) or herbicides may be employed as 
selectable markers. The selectable marker genes are usu- 
ally expressed using a promoter that confers constitutive 
expression, so that transgenic plants can be selected at 
any stage of their development. 

Because the Ti plasmid is too large to be easily 
manipulated, most experimental protocols that use 
Agrobacterium construct a strain harboring two plasmids: 
One is a disarmed Ti plasmid, and the second is a plasmid 
that contains left and right border sequences flanking the 
DNA of interest (Figure 17.19a). This strategy, separating 
the functional elements of the Ti plasmid into two plas- 
mids, is referred to as the binary approach. It results in 
the efficient transfer of the DNA of interest into the plant 
cell and its subsequent integration into the plant genome 
(Figure 17.19b). 

Unlike bacteria and yeast, which are single-celled 
organisms, transformed plant cells must be regenerated 
into an entire plant in order to reveal the effects of trans- 
genes on the plant phenotype. Traditionally, scientists 
have taken advantage of a unique feature of plant devel- 
opment, the totipotency of most plant cells: Under the 
appropriate environmental and hormonal conditions, an 
entire normal plant can be regenerated from a single 
isolated plant cell. Thus, after infection of plant cells with 
the modified Agrobacterium strain and selection of trans- 
formed cells on the basis of the selectable marker gene, 
progeny plants can be regenerated from the individual 
transformed cells (Figure 17.19c). This technique has been 
successfully applied to a wide variety of flowering plant 
species, including crop species such as rice, maize, and 
tomatoes. 

Plant researchers using Arabidopsis as a model sys- 
tem for studying basic biological processes sought an 
easier method of transformation that would not require 
regeneration from a single transformed cell. After sev- 
eral different techniques were attempted, they discovered 
that the simple technique of dipping Arabidopsis flow- 
ers into a culture of Agrobacterium works surprisingly 
well. It allows the T-DNA to be transferred directly from 
Agrobacterium to the egg cell of the female gametophyte. 
In this protocol, transgenic plants are selected from seed 
produced by the plant exposed to Agrobacterium. 

Many plant species are susceptible to Agrobacterium- 
mediated transformation. If they are not, DNA can be 


directly introduced into their cells. The cell walls of 
isolated plant cells are first removed enzymatically, after 
which the cells are mixed with heterologous DNA and 
given a heat or electrical shock to depolarize the mem- 
brane and facilitate the entry of DNA. Once in the cell, 
the DNA has the same fate as described above for DNA 
transferred into fungi. In plants, homologous recombi- 
nation is rare relative to illegitimate recombination, so 
the most common outcome is the insertion of the het- 
erologous DNA into a random location in the genome. 
In another technique, DNA is introduced into plant cells 
by particle gun bombardment, the use of high pressure 
to fire microscopic particles coated with DNA into plant 
cells. The particles are propelled with enough force to 
penetrate the cell wall and plasma membrane. Both of 
these techniques can be applied to any plant species. 


Transgenic Plants in Agriculture The two most 
common traits engineered into transgenic crops grown 
today are herbicide resistance and insect resistance. With 
herbicide-resistant crops—for example, the varieties sold 
as Roundup Ready—farmers can apply herbicide to a 
field to clear the ground of weeds and other non-crop 
plants without damaging the crop itself. This reduces 
the amount of tilling done to plow weeds under at the 
beginning of the season. Less tilling results in less soil loss 
and also saves on the use of fossil fuels. 

Cotton and maize crops resistant to insect herbiv- 
ory are two of the most widely grown transgenic crops. 
Insect resistance is usually conferred by the expression of 
genes derived from the bacterium Bacillus thuringiensis. 
Genes encoding approximately 100 insect toxins, known 
as Bt toxins, have been identified in different strains of 
B. thuringiensis. The toxins work by perforating the guts 
of different insect species, and different toxins have differ- 
ent “host” specificity. Transgenic plants expressing genes 
encoding Bt toxins are less palatable to insects and exhibit 
reduced insect herbivory. As a consequence, transgenic 
plants expressing Bt toxin genes require significantly less 
application of insecticides than do non-transgenic plants, 
thus reducing the insecticide load in the environment. 

While Bt toxins are clearly toxic to insects, other 
herbivores, such as humans, are impervious to the com- 
pounds. The properties of Bt toxins have been appreci- 
ated for some time. Organic farmers routinely spray 
B. thuringiensis directly on their crops to act as a “natural” 
insecticide. Millions of acres of transgenic maize, cot- 
ton, and potatoes expressing Bt genes and of herbicide- 
resistant soybeans are presently cultivated in the United 
States and several other countries. 


Golden Rice While many transgenic crops thus far 
used in agriculture have primarily benefited farmers 
in the developed world, the humanitarian potential 
for crop modification in aid of subsistence farmers in 
developing countries is exemplified by Golden Rice. Rice 
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(Oryza sativa) is the major staple food for much of the 
world. Because oil tends to become rancid, especially 
in tropical climates, rice is often milled until its oil- 
rich outer layer has been removed. Unfortunately, the 
remaining edible grain, the endosperm, lacks several 
micronutrients, including provitamin A, a vitamin A 
precursor. (Vitamin A can be obtained directly through 
consumption of animal products or indirectly from 
plants that produce carotenoids, which are converted 
to vitamin A after ingestion and are therefore termed 
provitamin A.) 

Vitamin A deficiency results in blindness and increased 
disease susceptibility, thus contributing to childhood mor- 
tality in many developing countries. It is estimated that 
vitamin A deficiency affects between 140 million and 250 
million preschool children worldwide, leading to 250,000 
to 500,000 cases of blindness per year. Because no wild or 
domesticated cultivars of rice produce provitamin A in the 
endosperm, recombinant technologies, rather than a con- 
ventional breeding program, are required to produce rice 
that has an endosperm containing provitamin A. 

Scientists knew that rice endosperm synthesizes 
geranylgeranyl diphosphate (GGPP), a precursor in the 
synthesis of carotenoids. Study of the carotenoid biosyn- 
thetic pathway in plants suggested that five plant-derived 
enzymes are needed to convert GGPP to f-carotene. 
However, the discovery that a single bacterial enzyme 
(CRTI) could replace three of the plant enzymes (PDS, 
ZDS, CRTISO) simplified the genetic engineering strategy 
(Figure 17.20a). Then, in 2000, Ingo Potrykus, Peter Beyer, 
and colleagues reported that the addition of only two 
genes, a daffodil-derived gene called PSY and the bacterial 
gene called CRTI, resulted in the production of b-carotene 
in rice endosperm (Figure 17.20b). This outcome was sur- 
prising because a gene called LCY was expected to be 
necessary as well, but apparently the endogenous rice LCY 
gene is already expressed in endosperm. 

Subsequently, work has focused on tailoring the pro- 
cess so that (1) the transgenes would be expressed only 
during endosperm formation and only in endosperm, (2) 
the B-carotene synthesis could be increased using differ- 
ent versions of the genes, (3) the selectable marker could 
be removed from the transgenic lines, and (4) the trans- 
genes could be introduced into rice cultivars that are typi- 
cally used by subsistence farmers in southeast and south 
central Asia and Africa. These improvements have led to 
transgenic lines that should provide a significant fraction 
of the required daily intake of provitamin A from a serv- 
ing of Golden Rice (Figure 17.20c). 

The funding for the research to produce Golden 
Rice was public, in part from the Rockefeller Foundation, 
but patents on many of the techniques and tools used to 
generate the transgenic rice are held by biotech compa- 
nies. Fortunately, these companies agreed to license the 
inventors of Golden Rice to provide the technology free 
of charge for humanitarian use in developing countries. 
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Figure 17.20 The generation of Golden Rice. 
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Golden Rice is an example of how customized crops can 
be developed to address specific nutritional needs and 
public health problems caused by dietary deficiencies. 

Transgenic plants have been largely accepted in some 
parts of the world, but many concerns have been raised 
about their introduction. Some critics fear that transgenes 
could be adverse to human health—for example, that 
people may have allergic reactions to the protein product 
of a transgene. Another concern is that the transgenes 
may “escape” into the environment if transgenic crop 
plants interbreed with related species growing nearby. 
The likelihood of this occurrence can be reduced by not 
growing transgenic crops in environments harboring re- 
lated species that have potential to interbreed. Transgenic 
crops must be tested to allay these concerns, but we must 
also recognize that, while the concerns about transgenic 
agricultural crops are valid, they are equally applicable to 
the cultivation of crops developed by traditional breeding 
methods. 


Transgenic Animals 


Protocols for the generation of transgenic animals are 
similar to those described for fungi, but as with plants, 
homologous recombination occurs much less frequently 
than illegitimate recombination (i.e., recombination not 
based on sequence homology). Caenorhabditis elegans, 
Drosophila, and Mus musculus (mice) are three of the 
most widely used genetic model animals and provide ex- 
amples of the variety of methods available for creation of 
transgenic animals. Totipotency is not characteristic of 
most of their cells; thus, methods to produce transgenic 
animals rely on the injection of DNA into eggs, embryos, 
or cells that will give rise to gametes, with the hope that 
the injected DNA will be integrated into the genome 
either by homologous or illegitimate recombination. 

Where injection directly into gametes is not feasible, 
DNA can be injected into isolated cells that are subse- 
quently transplanted into an embryo. The embryo then 
develops as a genetic chimera, an organism in which 
some cells have a different genotype than others, and will 
transmit transgenes to progeny only if the embryonic germ 
cells carry a copy of the transgene. As with the protocols 
utilized in fungi and plants, methods for the production of 
transgenic animals vary depending on the biological char- 
acteristics specific to each type of organism. 


C. elegans In the nematode worm C. elegans, one 
protocol for creating transgenic animals is to inject DNA 
directly into the gonads of hermaphrodites during oocyte 
development (Figure 17.21). The gonads are syncytial, 
meaning that gonadal cells each contain many nuclei anda 
large amount of cytoplasm. Eventually, each nucleus gives 
rise to a germ cell. If the injected DNA is integrated into 
the genome of a germ cell, the mechanism of integration 
is almost always illegitimate recombination. 


DNA is injected into the gonad. 


DNA injected into syncytium of gonad can be incorporated 
into oocytes following cellularization. 


Sperm Oocytes 


DNA may be integrated, often as a concatamer, 
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Figure 17.21 Transgenic C. elegans. 


The DNA is often, but not always, inserted as a con- 
catemer, that is, as multiple tandem copies of the inserted 
DNA. Concatemers are undesirable because they result 
in abnormal levels of gene expression, either because of 
the additional copies producing too much gene product 
or because of RNA-mediated gene-silencing effects trig- 
gered by the repetitions in the concatemer (discussed in 
Chapter 15). Alternatively, the injected DNA may exist as 
extrachromosomal arrays that are not integrated into the 
genome and which therefore may not segregate properly 
during mitosis. 

As with other systems for gene transfer, a select- 
able marker is built into the injected DNA to facilitate 
identification of cells that have been transformed. In 
C. elegans, a dominant mutant allele of the roller-6 gene 
[specifically, rol-6(su1006)] can be used. Animals with 
this dominant mutant allele exhibit a behavioral defect: 
Rather than moving in the normal serpentine pattern, 
they tend to roll in tight circles. Because animals with 
several copies of the mutant allele do not survive, it 
serves as a “marker” that also selects against concate- 
mers of transgenes. 


Drosophila In the 1980s, Gerald Rubin and Allan 
Spradling demonstrated that P transposable elements, a 
class of transposons, offered an efficient means of creating 
transgenic Drosophila, in most cases inserting only one 
copy of the DNA being transferred (see Chapter 15 for 
a description of P elements). Their idea was to use the 
endogenous activity of P elements to transpose transgenes 
into the genome (Figure 17.22). 

Based on their knowledge of P element transposi- 
tion, Rubin and Spradling reasoned that they could 
replace much of the P element DNA with exogenous 
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DNA as long as (1) transposase, the enzyme that con- 
trols P element movement, was provided; and (2) the 
P element ends were retained, since these are required 
for recognition by the transposase. In their method, two 
DNA molecules, one a modified P element and the other 
a DNA molecule encoding the transposase but lacking 
the sequences required for transposition, are co-injected 
into a Drosophila embryo. The modified P elements 
are induced to insert into the genome at random posi- 
tions by the action of the transposase. Typically, only 
a single P element is inserted, precluding the problems 
associated with the concatemeric arrays seen in trans- 
genic C. elegans. This strategy resembles the use of 
Agrobacterium to transform plants in that it too utilizes 


a biological system that has evolved to recombine DNA 
into a host genome. 

Since P elements transpose only in the germ-line cells 
of Drosophila, the injection is made into an early-stage 
embryo, targeting those cells that will give rise to the germ 
line. Early-stage Drosophila embryos are syncytial, and 
nuclei at the posterior end of the syncytium are most likely 
to give rise to the germ cells. The fly derived from the in- 
jected embryo is therefore a chimera in which most soma 
and some gametes are wild type, but some soma and gam- 
etes are transgenic. When the injected fly is mated with 
an uninjected fly of the same strain, gametes into whose 
genomes a P element was inserted will produce transgenic 
progeny. 

A commonly used selectable marker in Drosophila 
is the rosy (ry) gene. In the procedure under discussion, 
the embryos to be injected are ry /ry and have rosy eyes, 
rather than the wild-type red eyes. A wild-type, ry*, copy 
of the gene is included in the modified P element, in addi- 
tion to the DNA to be transformed into the fly. While flies 
derived from the injected embryos will have rosy eyes, 
some of the progeny derived from transgenic gametes of 
the injected fly will have red eyes due to the action of the 
ry’ allele on the inserted P element. As is characteristic 
of transposons, P elements insert into the genome at ran- 
dom locations. 


Vertebrates A general approach to creating transgenic 
vertebrates is to inject DNA directly into the nucleus of 
a fertilized egg cell, in a manner similar to that described 
above for C. elegans and Drosophila. The injected DNA 
can become integrated into the genome at random 
positions by illegitimate recombination. Because the 
DNA integrates randomly into the genome, the transgene 
becomes inserted at different locations in the genomes of 
different individual animals. In organisms such as salmon, 
each injected egg has the potential to develop into a 
transgenic individual (Figure 17.23). 

Two features of this method lead to variability in the 
expression of the transgene. First, due to the integration 
of the transgenes as multicopy concatemers, gene expres- 
sion levels can be affected as described for C. elegans. 
Second, the expression of the transgene can be abnormal 
because of the chromosomal environment in which it 
is located. For example, if the transgene is inserted into 
heterochromatin, gene expression may be altered as de- 
scribed for position effect variegation in Drosophila (see 
Chapter 15). Note that the problem of transgene position 
effects is shared by all transgenic organisms in which the 
transgene is integrated into the genome by illegitimate 
recombination. While position effects can pose problems 
in Drosophila, C. elegans, and plants, they are exacerbated 
in mice due to the larger average size of vertebrate genes 
and the larger amount of heterochromatin in vertebrate 
genomes. To overcome this variability, methods to more 
precisely insert transgenes were developed for mice. 
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Figure 17.23 Creation of transgenic salmon through 
injection of DNA into salmon eggs. 


Mus musculus Mice are important genetic models for 
human diseases and human physiology. The ability 
to create transgenic mice enables scientists to dissect 
not only the genetic and molecular basis of mouse 
development and physiology but also, by proxy, many 
aspects of human development and physiology. Two 
methods are available to create transgenic mice, a targeted 
approach and a nontargeted approach. 

In the nontargeted approach, the transgene is ran- 
domly inserted into the genome through illegitimate re- 
combination; in the targeted approach, the transgene is 
inserted into a specific locus in the genome through ho- 
mologous recombination. The latter method transformed 
the study of mouse biology since it allows for the creation 
of mice with specific loss-of-function (or knockout) and 
gain-of-function alleles. In 2007, Mario Capecchi, Martin 
Evans, and Oliver Smithies shared the Nobel Prize in 
Medicine or Physiology for their work leading to the de- 
velopment of knockout mice. 

Problems associated with variable genomic positions 
and expression of transgenes led geneticists to explore 
the possibility of using homologous recombination for 
transgene integration. Homologous recombination would 


provide more consistent transgenic mouse strains and 
would also facilitate the creation of mutations in specific 
mouse genes, which would be extremely useful for study- 
ing mammalian biology. Thus, methods were developed 
to identify mice in which exogenous DNA had been in- 
serted into the genome by homologous recombination as 
opposed to the much more frequent illegitimate recombi- 
nation (Figure 17.24a). The identification is accomplished 
by selecting for the homologous recombinant and at the 
same time selectively killing the transformants resulting 
from illegitimate recombination. 

The overall strategy is similar to that described for 
homologous recombination in yeast. The transforma- 
tion vector contains two regions of DNA homologous to 
the target locus flanking a positive selectable marker. An 
example of a positive selectable marker is the Neomycin 
(Neo) gene, whose product metabolizes the drug G418, 
which blocks translation and is lethal to mammalian cells. 
A vector containing these elements is capable of being in- 
tegrated into the genome by homologous recombination, 
but more than 99% of integrations will occur by illegiti- 
mate recombination. To select against nonhomologous 
recombination events, a negative selectable marker is 
added to the vector outside one of the regions of homol- 
ogy to the target gene. 

A commonly used negative selectable marker is a 
thymidine kinase (tk) gene derived from a herpes sim- 
plex virus. Thymidine kinase catalyzes the addition of 
a phosphate to deoxythymidine, forming deoxythymi- 
dine monophosphate, which is eventually converted to 
deoxythymidine triphosphate, one of the substrates for 
DNA synthesis. In contrast to mammalian thymidine 
kinase, thymidine kinase from herpes simplex virus can 
also catalyze the addition of phosphate to thymidine 
analogs that cause chain termination when incorporated 
into DNA. Because the endogenous mammalian thymi- 
dine kinase does not recognize the thymidine analogs as 
substrates, only those cells expressing the herpes simplex 
virus tk gene are sensitive to the thymidine analogs. Thus, 
cells harboring the viral tk gene will be selected against 
when plated on media containing the thymidine analog 
ganciclovir. Such thymidine analogs are also used as po- 
tent antiviral medications, since only cells harboring the 
virus are sensitive to the analog. 

For transformed mouse cells to survive, they must 
acquire the positive marker and must lose the negative 
marker. The occurrence of a homologous recombina- 
tion event between the negative and positive markers is 
one possible way in which the introduced DNA can be 
integrated to produce a cell that possesses the positive 
and lacks the negative marker. Selection for this type of 
transformation is called positive—negative selection. A 
related protocol, negative—positive—negative selection, 
where negative selectable markers are positioned at each 
end of the introduced DNA, has been successfully used 
to identify homologous recombination events in plants, 
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Figure 17.24 Creating a loss-of-function CFTR (cystic fibrosis transmembrane conductance 
regulator) allele in mice through homologous recombination. Mutations in the human ortholog 
are the cause of cystic fibrosis. 


such as rice, and should be generally applicable to any internal cells, known as embryonic stem (ES) cells, are 


species. totipotent. The production of a transgenic mouse starts 
What types of mammalian cells are typically targeted with the isolation of ES cells from the mouse strain to be 
for gene transfer? The blastocyst-stage mammalian em- transformed. The ES cells are grown in culture, and DNA 


bryo consists of an outer sphere of cells anda small pool is introduced into the cells, often by transiently depolar- 
of cells inside the sphere. At the blastocyst stage, the izing their membranes to make the cells permeable to 
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DNA. The cells are then transferred to media contain- 
ing the agents for positive and negative selection, and 
transformed cells in which homologous recombination 
occurred are selected. 

The selected transformed ES cells are reintroduced 
into a blastocyst from a mouse of a genotype different 
from that of the transformed cells, allowing the progeny 
derived from the transformed ES cells to be detected 
(Figure 17.24b). For example, alleles conferring differ- 
ences in coat color are often used. The blastocyst, now 
carrying transformed ES cells, is implanted into a surro- 
gate female mouse. Because only some of the ES cells in 
the host blastocyst are transgenic, the mouse that devel- 
ops from the embryo in which the transformed cells were 
introduced is a genetic chimera in which some tissues are 
derived from the transformed ES cells and other tissues 
are derived from host ES cells. Chimeric animals can be 
readily identified by their variegated coat color. 

It is hoped that at least some of the gametes of the 
chimeric offspring of the host mouse will be derived from 
the transformed ES cells, so that some mice in the subse- 
quent generation will be heterozygous for the mutation 
caused by the homologous recombination event. If two 
heterozygous offspring of this generation are interbred, 
mice homozygous for the mutation can be produced. 
Technologies for the construction of other transgenic 
mammals, including sheep, cats, cows, horses, monkeys, 
and rats, follow a similar protocol. 


Advances in Altering and Synthesizing 
DNA Molecules 


Sometimes, the wild-type version of a gene is the one 
that geneticists wish to express as a transgene. But we 
have seen that in some cases, it is desirable to express a 
modified version in which specific nucleotides have been 
changed. One reason it is sometimes desirable to alter the 
sequence of an encoded protein is to render the protein 
either more or less active. For example, changes in the 
identities of specific amino acids can sometimes cause an 
enzyme to be constitutively active or to be more stable at 
high or at low temperatures. A second reason to change 
the nucleotide sequence of a gene is to improve its expres- 
sion in a species with a different codon bias than that of 
the species from which the gene was derived (as noted 
above under “Transgenes in Escherichia coli”). 

In the past, making specific changes to a DNA se- 
quence was a laborious process. However, technology for 
chemically synthesizing DNA molecules has improved 
significantly in recent years in terms of both accuracy 
and cost, making the synthesis of any DNA sequence 
feasible. Consider the example of human insulin genes. 
In the late 1970s, the construction of the B-chain gene 
from 18 chemically synthesized oligonucleotides 10 to 
12 nucleotides long and of the A-chain gene from 12 


oligonucleotides 10 to 15 nucleotides long was a monu- 
mental task. Today, however, oligonucleotides tens to 
hundreds of bases in length are inexpensive to construct 
via PCR-based approaches. 

More recently, chemical syntheses of DNA mole- 
cules up to 50,000 bases in length have become fea- 
sible. Geneticists are able to design a DNA molecule 
from scratch and synthesize it for subsequent use in 
living organisms. This approach is useful when multiple 
changes would otherwise be required in a DNA mol- 
ecule before its introduction into a transgenic organism. 
As with sequencing technologies, advances in chemical 
synthesis of large DNA molecules have the potential to 
transform biotechnology and biological research. In 2008, 
the entire 582,970-bp genome of Mycoplasma genitalium 
was chemically synthesized in vitro, cloned into a YAC 
vector, and propagated in Saccharomyces cerevisiae. The 
synthetic genome was then transplanted into a receptive 
Mycoplasma cytoplasm, generating a cell that would use 
the genetic information contained on the synthetic chro- 
mosome. This ability to synthesize genome-sized nucleic 
acid molecules is revolutionizing experimental biology. 


Manipulation of DNA Sequences in Vivo 


The ultimate technology for investigation of gene func- 
tion and also for gene therapy would be the ability to 
precisely change DNA sequences in the genome in vivo. 
Such technology would facilitate the examination of gene 
function by the creation of specific alleles and allow the 
“correction” of DNA sequences in cells with deleterious 
alleles. While these technologies do not yet exist, recent 
advances, two of which are described here, have made in 
vivo manipulation of genome sequences possible. 


Site-Specific Recombination In some cases it is 
desirable to manipulate transgenes after they have been 
introduced into an organism. For example, the ability 
to remove the positive selectable marker gene after 
selection of transformants mitigates one of the concerns 
raised by critics of transgenic plants (as shown below). 
In addition, in vivo manipulation of transgenes facilitates 
the production of conditional alleles of genes whose 
null allele is lethal. The ability to specifically recombine 
DNA molecules makes in vivo manipulation of transgenes 
feasible. 

Several bacteriophages use site-specific recombina- 
tion systems during their life cycle, either for intramolecu- 
lar recombination within the bacteriophage genome or for 
intermolecular recombination into host genomes. These 
recombination systems can be harnessed for producing 
recombinant DNA molecules in vitro and for recombin- 
ing DNA molecules in vivo. Bacteriophage site-specific 
recombination systems have two components: (1) DNA 
sequences in the bacteriophage genome that are identical 
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to sequences in the target bacterial genome and (2) an 
enzyme, commonly called a recombinase or integrase, 
that binds to the identical DNA sequences and catalyzes 
their recombination. Two bacteriophage recombination 
systems, one in bacteriophage à and the other in bac- 
teriophage P1, have proven particularly valuable in the 
development of site-specific recombination for use in 
molecular biology experiments. 

A site-specific recombination system derived from bac- 
teriophage P1 utilizes Cre recombinase, a bacteriophage- 
encoded protein that acts to recombine DNA containing 
loxP sequences (Figure 17.25). The loxP sites are 34-bp 
sequences consisting of two 13-bp inverted repeats sepa- 
rated by an 8-bp spacer that provides asymmetry, and they 
are specifically recognized by Cre recombinase. The Cre 
recombinase binds to two loxP sites and catalyzes a re- 
combination event between them. If the two loxP sites are 
direct repeats, the intervening DNA is deleted, whereas if 
the two loxP sites are inverted relative to one another, the 
intervening sequence is inverted. 

The Cre—lox recombination system has been adapted 
to recombine DNA in vivo in transgenic organisms. For 
example, loxP sites are added to the ends of the DNA 
to be deleted or inverted and introduced as a transgene 
into an organism. A second transgene encoding the Cre 
recombinase is also introduced into the same organism. 
In cells where the Cre recombinase is expressed, the DNA 
flanked by the loxP sites will be deleted or inverted. 

One reason a geneticist might want to delete a trans- 
gene after having introduced it into the genome is to 
assess the function of the gene at specific times and in 
specific tissues during development. For example, if a null 
loss-of-function allele results in embryonic lethality, the 
role of the gene at later developmental stages is difficult 
to assess. One approach to determining the post-em- 
bryonic function of such genes is to complement a loss- 
of-function mutant with a functional copy of the gene 
flanked by loxP sites. In cells where the Cre recombinase 
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Figure 17.25 Bacteriophage site-specific recombination 
systems. 


is active, the transgene will be deleted, causing these cells 
and their descendants to have a mutant genotype. If the 
Cre recombinase is driven by a promoter that confers 
inducible expression or expression that is temporally or 
spatially restricted, a genetic chimera can be created, al- 
lowing an assessment of gene function in specific tissues. 

A second application is the removal of selectable 
markers in transgenic organisms. An objection to the use 
of transgenic organisms in agriculture is that some trans- 
genic strains contain a selectable marker providing resis- 
tance to antibiotics, which might spread into the natural 
population. The antibiotic selectable marker genes were 
used to select the transgenic organism but are no longer 
needed once the transgenic organism has been identified. 
One strategy for eliminating the selectable marker is to 
flank the unwanted transgene with loxP sites in a direct 
repeat orientation. A plant containing this transgene is 
then crossed with another transgenic plant expressing the 
Cre recombinase, and the unwanted transgene is deleted 
in the F4. It is then possible to segregate the transgene 
encoding the Cre recombinase away from the desired 
transgene in subsequent generations. 


Targeted DNA Sequence Changes One approach to 
inducing changes in genomic sequences in vivo is to 
design a DNA endonuclease to target a specific genomic 
location. The endonuclease creates a double-strand break 
at the site, which is subsequently repaired by endogenous 
repair mechanisms. 

Two different approaches are presently being used to 
cause the nuclease to target a specific site in the genome. 
First, the nuclease can be translationally fused to a se- 
quence-specific DNA binding domain that recognizes only 
the site in the genome to be targeted. Second, the nuclease 
can be incorporated into a complex with an RNA molecule, 
which then provides specificity via complementary base 
pairing with the target sequence of interest. This latter ap- 
proach is based on reengineering a bacterial system called 
CRISPR-CAS that evolved as a defense mechanism against 
invading nucleic acids. The CAS nuclease introduces dou- 
ble strand breaks in DNA molecules at sites determined by 
an RNA molecule with which it forms a complex. 

If the double-strand break is repaired by non- 
homologous end-joining, then small deletions often remain 
at the site of the break, leading to possible loss- or gain- 
of-function alleles, depending on what sequences are lost. 
Alternatively, the break may be repaired by homologous 
recombination, either with endogenous sequence from the 
homologous chromosome in a diploid cell or with exog- 
enously supplied DNA sequences. In the latter case, if the 
exogenously supplied DNA has been constructed in such a 
way that it contains the desired change, a specific sequence 
change in the chromosome may be accomplished. 

Genetic Analysis 17.2 asks you to put some of these ideas 
to work by designing a mouse model of a human disease. 


GENETIC ANALYSIS 


PROBLEM Mouse models of human diseases are valuable research tools that can be used to test 
therapies and drugs. How would you make a transgenic mouse model of Huntington BREAK IT DOWN: Review the 


disease, which is caused by an autosomal dominant mutation consisting of an expan discussion on p. 597 of procedures for 
sequence of trinucleotide repeats? BE AK IT DOWN: Review the creating transgenic mice. 


defining features of an autosomal dominant 
mutation (see Section 4.1). 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic this problem addresses 1. This problem about recombinant DNA technology asks how to construct a 
and the nature of the required answer. specific strain of transgenic mouse. 
2. Identify the critical information given in the 2. The desired disease model is of Huntington disease (HD), described as an 
problem. autosomal dominant mutation that consists of an expanded sequence of 
trinucleotide repeats. The transgenic mouse is to be used to test therapies 
and drugs. 
Deduce 
3. Inheritance patterns are always a key 3. Since HD is dominant, a phenotype should be evident if a single mutant 
consideration in genetic research designs. allele is introduced into the genome. 
Identify the inheritance pattern of the HD 
phenotype. 
4. Evaluate the ways in which the HD allele 4. Transgenic mice can be generated by random integration of a transgene or, 
can be transferred into mice. alternatively, by homologous recombination that replaces the endogenous 
gene with a mutant version. 
6) Choose the method of generating 5. Since we want the transgene to be expressed in the same pattern as 
a transgenic mouse that will come closest the wild-type mouse HD gene, homologous recombination is the best 
to modelling the disease of interest. approach, because the mutant HD gene will then be in the same genomic 


PITFALL: Randomly integrated transgenes context and will be expressed in the same pattern as the wild-type gene. 


may exhibit variation in expression patterns. 


Solve 

6. Design a strategy to replace the wild-type 6. The positive-negative selection approach outlined in Figure 17.24 to 
mouse HD gene with a mutant version of produce a transgenic mouse by homologous recombination results in 
the human HD gene. a loss-of-function allele. This approach must be modified to create a 


yy) gain-of-function allele. 
a. Construct a vector in which a human mutant HD gene is flanked by 
t 


mouse HD regulatory sequences (5’ and 3’ of the HD gene). 

b. The positive selective marker gene can be placed downstream of the 
HD gene, in a position not likely to interfere with HD gene expression, or 
could be removed using the Cre—/ox approach outlined in Figure 17.25. 

c. Asecond type of transgenic mouse, expressing the wild-type human 
gene driven by the same regulatory sequences, would provide a useful 
control to compare the specific phenotypic effects induced by the 
expression of the mutant allele. 


desired, the positive selectable marker mus 


PITFALL: Since a functional allele is 
not interfere with HD transgene function. 


For more practice, see Problems 7, 8, 11, 27, and 30. Visit the Study Area to access study tools. MasteringGenetics™ 


17.3 Gene Thera py Uses Recombinant mein sa to ae or alleviate disease symptoms 
is termed gene therapy. From a genetic perspective, gene 
DNA Technology therapy is similar to a genetic complementation experi- 


ment in which the gene introduced by gene therapy com- 
pensates for a genetic abnormality in the altered cell. Two 
types of gene therapy, classified as somatic gene therapy 
and germinal gene therapy, are feasible. 


The ability to manipulate gene expression through the in- 
troduction of a transgene raises the possibility that human 
genetic diseases could be treated by the introduction of a 
functional version of the mutant gene. The use of genes as 
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Two Forms of Gene Therapy 


Somatic gene therapy targets somatic cells, whose de- 
scendants will not give rise to germ cells. Any genetic 
alterations induced in the targeted cells will be passed 
to daughter cells by mitosis, but the alteration will not 
be inherited by progeny of the individual undergoing 
somatic gene therapy. The specific somatic cells to be tar- 
geted depend on the disease in question. For example, in 
individuals with cystic fibrosis, the epithelial cells of the 
lungs represent a logical target, since lungs are severely 
affected in cystic fibrosis. On the other hand, for diseases 
of the blood, cells of the various hemopoietic lineages 
are the target cells; they can be removed from bone 
marrow, treated, and returned to the same individual. 
Somatic gene therapy turns the treated individual into 
a genetic chimera that has the transgene present in the 
target cells but not in other somatic cells or in germ cells. 
Somatic gene therapy can potentially be used to treat sev- 
eral genetic diseases whose phenotype becomes apparent 
early in childhood. 

The alternative strategy for gene therapy, germinal 
gene therapy, targets cells of the germ line, which give 
rise to gametes. Because germinal gene therapy alters 
germ-line cells, the therapeutic transgene is transmit- 
ted to the progeny of the treated individual. Both types 
of gene therapy have been successful in animal systems; 
but for ethical reasons, only somatic gene therapy has 
been attempted in humans. In the following paragraphs, 
we discuss somatic gene therapy in humans and describe 
modifications of these protocols suggested by successful 
somatic gene therapy experiments in mice. 


Gene Therapy in Humans 


The primary difficulties in human somatic gene therapy 
concern the delivery of the transgene to the somatic cells 
of interest and the proper expression of the transgene in 
targeted cells. The DNA encoding the gene must be deliv- 
ered to the proper cells, pass through the cell membrane 
and into the nucleus, and, once there, be expressed at a 
level that is sufficient to provide normal gene function. In 
some cases—for example, in hemopoietic diseases—the 
cells to be treated can simply be extracted from the body, 
treated in vitro, and then injected back into the body. 
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However, in other cases—for example, cystic fibrosis, in 
which lung epithelial cells are the target—the cells must 
be treated in situ because they cannot be removed from 
the patient. 

The choice of vector to deliver the DNA to the cells 
is pivotal. Gene therapy methods often take advantage of 
viruses that have evolved mechanisms to access specific 
cell types. Essentially, viruses are harnessed to transduce 
the transgene into the target cells the way the transduc- 
tion of DNA between bacteria is performed by bacterio- 
phage (see Chapter 7). The viruses can be “disarmed” 
so that they no longer have the ability to cause the dis- 
eases associated with their wild-type relatives. Several 
types of viral vectors have been used, including gamma- 
retroviruses, lentiviruses, and adenoviruses (Table 17.3). 
Each has advantages and disadvantages for gene therapy 
protocols. 

Many viral vectors deliver transgenes by integrating 
into the genome of the target cell. Integration provides a 
mechanism for stable gene transfer and thus permanent 
correction of the defect. Integration of the vector into 
the genome is not without risks, however; the insertion 
may cause a detrimental mutation, a problem that has 
plagued most human gene therapy experiments to date. 
The treatment of a serious immune system disease called 
severe combined immune deficiency syndrome (SCIDS) 
provides an example. 

SCIDS patients lack the ability to produce a category 
of blood cells called T cells that are critical to the body’s 
defense against infection. One form of SCIDS is due 
to mutations in the gene encoding the gamma subunit 
(y chain) of the interleukin-2 receptor on T cells and is 
X-linked. In the mid-1990s, a gene therapy approach was 
designed using a retroviral vector carrying the y chain 
cDNA driven by viral regulatory sequences. The retrovirus 
carrying the cDNA was bounded by long terminal repeats 
(LTRs; see Chapter 12). In one study, 9 of 10 patients 
were successfully treated, and they exhibited functioning, 
adaptive immune systems following gene therapy. In three 
patients, however, an uncontrolled increase of mature 
T cells, termed T-acute lymphoblastic leukemia, devel- 
oped in the years immediately following treatment. In 
each of these individuals, the retrovirus became inserted 
into the LMO2 gene in such a way that the retroviral LTR 
promoter was able to cause unregulated expression of 


Table 17.3 Viruses Used as Vectors in Gene Therapy 
Virus Type Integration into Genome Target Capacity 
Retrovirus Integrates; insertional mutagen Infects dividing cells 8 kb 

~ Lentivirus F Integrates; insertional mutagen -infects nondividing cells 8 kb 

i Adenovirus Nonintegrating Infects nondividing cells 7.5 kb 

Í Adeno-associated virus Episomal, but can integrate Infects nondividing cells 4.5 kb 
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LMOz2. This gene is known to be required for the differen- 
tiation of hemopoietic cells. Its overexpression is thought 
to be what led these patients to develop leukemia. 

This trial highlighted one of the concerns raised by 
the use of retroviruses as gene therapy vectors—that 
they may act as mutagens. Once this possibility was rec- 
ognized, the SCIDS gene therapy trials just described 
were suspended. However, the high proportion of treated 
individuals whose immune defects were corrected in the 
study suggests that gene therapy can be a viable approach 
to treating such diseases. 

In addition to concerns over safety and efficacy, the 
use of viral vectors presents technical challenges stem- 
ming from size limits on the amount of DNA that can 
be packaged in the viral capsid (similar limits were dis- 
cussed in regard to bacterial transduction in Chapter 7). 
In most cases, the amount of DNA that can be packaged 
by a virus is much smaller than the size of a typical hu- 
man gene. For example, the transcribed region of the 
CFTR gene spans approximately 170,000 bp, from which 
are produced a 6132-bp processed mRNA encoding a 
1480—amino acid protein. Since viral vectors can ac- 
cept only 5 to 10 kb of DNA, only a cDNA of the CFTR 
gene lacking all endogenous gene-expression regulatory 
elements can be accommodated in a viral vector. In the 
absence of these endogenous regulatory sequences, the 
expression of the CFTR coding sequence is driven by 
viral regulatory sequences, which might not regulate 
the transgenes in a manner appropriate for proper gene 
function in the target cells. 

Virus-based gene therapy continues to be employed 
in selected experimental cases despite past failures and 
continuing concerns over the safety of the procedures. 
Successes in treating patients with cystic fibrosis, SCIDS, 
and several other human hereditary conditions offer 
hope that continued research will identify effective vec- 
tors for delivering treatment that is sustained, targeted, 
and safe. 

The Case Study in this chapter examines an approach 
to gene therapy whereby mutant alleles are corrected in 
cultured cells that are then reintroduced into the host. 


17.4 Cloning of Plants and Animals 
Produces Genetically Identical 
Individuals 


Many plants have the capacity for vegetative (asexual) 
propagation in addition to sexual propagation. For exam- 
ple, poplar and aspen (Populus sp.) groves often consist of 
vegetatively propagated clones, all genetically identical. 


Some of these clonal groves are estimated to be at least 
10,000 years old. Humans, taking advantage of the abil- 
ity of plants to reproduce vegetatively, have been clonally 
propagating plants for centuries in agricultural practices. 
In these protocols, heterozygous genotypes of agricultur- 
ally desirable specimens are propagated intact, without 
the segregation of alleles that occurs during sexual re- 
production. Heterozygous genotypes often exhibit hybrid 
vigor, resulting in high yields in comparison to inbred 
varieties. 

Perhaps the most conspicuous example of agricul- 
tural vegetative propagation is the cultivation of grapes 
(Vitus vinifera), which were domesticated 6000 to 7000 
years ago. Most grape cultivars are highly heterozygous; 
that is, they have two different alleles at many genomic 
loci. Thus, when they are self-fertilized or crossed with 
another cultivar, extensive segregation of genotypes and 
phenotypes is observed in the progeny. Because this pres- 
ents an obstacle to controlling the properties of grape 
plants through breeding, cultivars that possess favorable 
phenotypes are propagated by cuttings (that is, additional 
plants are grown from pieces of source plants). In most 
vineyards, the vines are chimeric: The shoots are all ge- 
netically identical and chosen on the basis of their fruit 
phenotype, and the roots, also identical to one another, 
are of a different genotype that is chosen for being well 
adapted to local soil conditions. 

Several wine grape cultivars can be traced back to 
the Middle Ages, and some are likely to be even older. 
For example, Pinot was first described in Roman times 
and is thought to be at least 2000 years old. While clonal 
propagation allows maintenance of specific genotypes, 
somatic mutations—due, for example, to errors in DNA 
replication and transposable element activity—can accu- 
mulate over time and lead to phenotypic variation. Thus, 
a mutation in a gene required for pigment synthesis led to 
the formation of Pinot blanc, a white-berry cultivar, from 
Pinot noir, the ancestral black-berry cultivar. 

Unlike plants, most animals do not readily propagate 
clonally in nature—but there are exceptions. For example, 
some aphid species undergo multiple parthenogenetic 
(clonal) generations in the spring and summer, followed 
by sexual reproduction in the autumn. Since most animal 
cells are not totipotent, animals do not readily regenerate 
from single cells. (An important exception is embryonic 
stem cells, which have the potential to differentiate into 
any cell type in the body.) Thus, techniques for cloning 
animals, and in particular mammals, from single differen- 
tiated cells are considerably more complicated than those 
for cloning plants. 

Dolly, a sheep, was the first cloned mammal. In 
the protocol used to produce Dolly, a diploid nucleus 
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is isolated from a differentiated cell of the animal to be 
cloned (Figure 17.26). This nucleus, containing all the 
nuclear genetic information of the animal from which 
it was taken, is injected into an egg cell that has had its 
own nucleus removed. The egg cell can be derived from 
the animal to be cloned (if it possesses egg cells) or from 
a different individual. If the nuclear transplantation is 
successful, the genome of the donor nucleus will direct 
the development of the embryo derived from the egg 
cell. The use of a diploid donor nucleus means that fer- 
tilization with a sperm cell is not required to produce a 
diploid nucleus in the embryo; thus, the genetic constitu- 
tion of the embryo will be identical to that of the donor. 
Bear in mind, however, that while the nuclear genome is 
genetically identical to that of the donor, the mitochon- 
drial genome is derived from the surrogate egg cell. The 
diploid egg cell is then induced to begin embryogenesis 
and implanted into a surrogate mother. If all goes well, it 
will develop into a normal embryo, and birth of a normal 
offspring will follow. 

In most mammals, the frequency of success with 
this protocol has been quite low. Dolly’s was the only 
one out of 270 implanted egg cells that resulted in the 
birth of a sheep. Donor cells have been derived from 
adult animals—Dolly’s donor cell was a mammary gland 
cell—and are therefore highly differentiated somatic 
cells rather than totipotent embryonic stem cells. In dif- 
ferentiated somatic cells, such as those of the mammary 
gland, the patterns of facultative heterochromatin (see 
Chapter 15) are vastly different from those of embryonic 
stem cells. In other words, although the sequences of nu- 
cleotides in the genomes of differentiated and embryonic 
stem cells are identical, the epigenetic modifications of 
the histones and DNA methylation patterns differ. The 
low frequency of success in the initial attempts to clone 
mammals was likely due to deficiencies in reprogram- 
ming the genetic material of the injected nucleus to 
mimic the epigenetic modifications characteristic of an 
embryonic stem cell. A failure in epigenetic reprogram- 
ming has also been postulated as a possible cause of 
Dolly’s shortened life span. 

Nevertheless, advances in knowledge of ES cell biol- 
ogy, and their application to reprogram certain differenti- 
ated cells in vitro to behave like stem cells, suggest that 
the cloning of mammals will increase over time (see the 
chapter Case Study for more details). Despite the difficul- 
ties, many different mammals besides sheep have been 
successfully cloned, including mice, cows, horses, don- 
keys, cats, and dogs. 
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Figure 17.26 Cloning animals by nuclear implantation. 
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CASE STUDY 


Curing Sickle Cell Disease in Mice 


The ideal somatic gene therapy would be one that corrects 
the specific mutation causing the genetic disease rather than 
just compensating for the mutant allele. Advances in un- 
derstanding the biology of embryonic stem (ES) cells have 
brought new forms of somatic gene therapy that may ap- 
proach the ideal for some genetic diseases. Embryonic stem 
cells are totipotent, meaning they have the potential to dif- 
ferentiate into any cell type in the body. In addition, as 
discussed in Section 17.2, the genome of an ES cell can be 
manipulated by homologous recombination. Thus, if ES cells 
can be isolated from an individual, gene mutations within the 
cells could perhaps be corrected, and the cells could then be 
induced to differentiate into the appropriate cell type to treat 
the genetic disease. As illustrated in the mouse experiment 
described below, the ability to create and manipulate ES cells 
provides a means of isolating cells from an individual, correct- 
ing mutations in the cells, and reintroducing the “corrected” 
cells into the body. 


CREATING ES CELLS FROM FIBROBLASTS In many cas- 
es, the diagnosis of a genetic disease is not made until early 
childhood, when the body no longer possesses any ES cells, 
because they form only during early embryogenesis. How can 
ES cells be obtained from a person who has none? The answer 
is to create ES cells from other cells of the body. 

In 2006 and 2007, a series of experiments demonstrated 
that mouse or human fibroblasts, a type of cell occurring in 
connective tissue, could be reprogrammed in vitro to behave 
like stem cells. These reprogrammed cells have been called 
induced pluripotent stem cells, or iPS. (The word pluripotent 
is used because scientists do not yet know if the iPS cells are 
totipotent.) This reprogramming of differentiated cells was 
accomplished by expressing a combination of three to four 
transcription factors (choices included Oct4, Sox2, c-Myc, and 
KIf4). The transcription factors that were used are normally 
expressed in ES cells and appear to be sufficient to induce 
reprogramming of the transcriptional networks of differenti- 
ated somatic cells into networks characteristic of ES cells. 


GENE THERAPY PROOF OF PRINCIPLE These advances set 
the stage for using iPS cells in gene therapy. Proof of principle 
(a phrase used by scientists to mean proof that the general idea 
is valid) was provided using a mouse model for sickle cell dis- 
ease (Figure 17.27). The basic strategy being tested consisted 
of @ harvesting adult cells, @ reprogramming adult cells into 
iPS cells, @ repairing the genetic defect through homologous 
recombination, @ differentiating the iPS cells into hemopoietic 
precursors in vitro, and @ transplanting the corrected cells into 
bone marrow of affected mice. 

The starting point for this test of somatic gene therapy 
was the creation of a “humanized” mouse model for sickle 
cell anemia by substituting human a-globin genes for the 
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endogenous mouse a-globin genes and substituting human 
B$ (sickle) globin genes for the mouse B-globin genes. Mice 
homozygous for the B°-globin allele (B°/B>) exhibited typical 
disease symptoms, including severe anemia and erythrocyte 
sickling. Fibroblasts isolated from the tail of B°/B° mice were 
infected with retroviruses encoding the Oct4, Sox2, and KIf4 
transcription factors and with a lentivirus encoding the c-Myc 
transcription factor. Expression of these four transcription fac- 
tors resulted in the reprogramming of the fibroblast cells into 
iPS cells. On either side of the c-Myc gene on the lentivirus, lox 
sites had been placed, to allow the gene to be excised from 
the genome when the cells were infected with an adenovi- 
rus encoding Cre recombinase. This was important because 
continued expression of c-Myc predisposes cells to become 
cancerous. Although the other three transgenes were not 
removed in this experiment, their removal by a similar mecha- 
nism is also recommended. 

To correct a B°-globin allele, a transformation vector 
encoding the B*-globin allele was introduced into the iPS 
cells, and hygromycin- and ganciclovir-resistant homologous 
recombinants were created using the procedure described in 
Section 17.2. The corrected iPS cells were now heterozygous 
at the B-globin locus (B4/B*). The B4/B> iPS cells were then 
differentiated into hemopoietic progenitors (HPs, cells that 
have the potential to differentiate into any of the hemopoi- 
etic lineages) by infection with another retrovirus encoding 
the HoxB4 gene, which induces the differentiation of ES cells 
into HPs when incubated with cytokines secreted from bone 
marrow cells. The B4/B° HPs were then transplanted back 
into B°/B> mice in which the endogenous B°/B° bone marrow 
cells had been eliminated by irradiation, so that now the 
84/85 HPs constituted the primary source of hemopoietic cells. 
In this particular experiment, the HoxB4 coding sequence was 
translationally fused with that of green fluorescent protein 
(GFP), so the activity of the B4/B> HPs could be monitored 
by the presence of GFP* cells in the blood. Subsequently, by 
all physiological tests, the mice receiving the B4/B° HPs were 
cured of sickle cell disease. 

These experiments in mice suggest there is promise in 
the use of ES or iPS cells for gene therapy, but at least two 
facets of gene therapy procedures continue to cause concern. 
Problems associated with using retroviruses and oncogenes 
for reprogramming need to be resolved before implementing 
such a protocol in humans. In addition, whether iPS cells are 
truly totipotent or still contain an epigenetic memory of their 
origin remains to be determined. Since an individual's own 
cells are used as the source for genetic modification, there 
are no impediments due to immune system incompatibility. 
However, this approach is limited to those diseases, such as 
blood disorders, in which cells can be isolated, genetically cor- 
rected, and reintroduced into the body. 
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Figure 17.27 Genetic therapy for mice with sickle cell disease. 
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17.1 Specific DNA Sequences Are Identified 
and Manipulated Using Recombinant DNA 
Technology 


I Restriction enzymes, which cut at specific DNA sequences, 


are used to fragment large DNA molecules into defined | 


smaller pieces. 


E A restriction map of a DNA molecule can be constructed | 


by analyzing patterns of DNA fragments after restriction 


enzyme digestion. I 


I DNA fragments can be ligated to create recombinant 
DNA molecules, usually composed of a vector that can be 
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amplified in a biological system and a target DNA insert to 
be amplified. 


While cohesive compatible ends facilitate the creation of 
recombinant DNA molecules, any two DNA fragments can 
be ligated if their ends are made blunt. 

Amplification of recombinant DNA molecules in a biological 
system allows the production of DNA clones. 

Bacteriophage and bacterial and yeast artificial 
chromosomes allow the cloning of large DNA molecules. 
Genomic libraries are collections of cloned DNA fragments 
that represent the entire genome of an organism. 
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E cDNA libraries are collections of cloned DNA fragments that 
represent the mRNA population of an organism or tissue. 

I DNA hybridization, which depends on complementary 
base pairing, is a means of identifying similar sequences in a 
mixture of DNA sequences. 

E Long DNA molecules can be sequenced using primer 
walking methods or by shotgun sequencing and reassembly 
via computer algorithms. 


17.2 Introducing Foreign Genes into Genomes 
Creates Transgenic Organisms 


E Genes introduced into an organism are called transgenes. 
Genes introduced from another species are termed heterolo- 
gous transgenes. 


Transgenes can be introduced into yeast on plasmids or, 
alternatively, by homologous recombination into the yeast 
chromosome. 

Agrobacterium and its tumor-inducing plasmid can be har- 
nessed to create transgenic plants in which the transfer DNA 
carries the desired transgene. 

Transgenic Drosophila are created by injection into embryos 
of a P element transposon carrying the transgene. 


Transgenes are introduced into mice by direct injec- 
tion of DNA into isolated cells. Detection of homologous 
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recombination events is facilitated by positive—negative 
selection of embryonic stem cells. Transgenic mice are then 
created by injection of transgenic embryonic stem cells into 
an embryo that is subsequently implanted into a surrogate 
mother, and the resulting progeny are chimeric. Non- 
chimeric mice are selected in the following generation. 

| Bacteriophage recombination systems can be used to 
manipulate DNA sequences in vitro and transgenes in vivo. 


17.3 Gene Therapy Uses Recombinant 
DNA Technology 


| Gene therapy is the application of recombinant DNA 
technology and transgenesis to treat human diseases. 


| In somatic gene therapy, transgenes are targeted to somatic 
cells and are not heritable. In germinal gene therapy, 
transgenes are targeted to germ cells and are thus heritable. 


17.4 Cloning of Plants and Animals Produces 
Genetically Identical Individuals 


I Many plants reproduce clonally in nature, whereas clonal 
reproduction in animals is rare. 

E Clonal reproduction in mammals requires reprogramming 
of differentiated somatic cells into stem cells. 
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1. What purpose do the f-lactamase and lacZ genes serve in 


the plasmid vector pUC18? 


2. The human genome is 3 X 10” bp in length. b 
a. How many fragments would be predicted to result 
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with the following enzymes: Sau3A (“ GATC), 


BamHI (G~GATCC), EcoRI (GAATTC), and NotI 


from the complete digestion of the human genome 


(GC” GGCCGC)? 
. How would your initial answer change if you knew that 
the average GC content of the human genome was 40%? 


. Ligase catalyzes a reaction between the 5’-phosphate and 
the 3’-hydroxyl at the ends of DNA molecules. The en- 
zyme calf intestinal phosphatase catalyzes the removal of 
the 5’-phosphate from DNA molecules. What would be the 
consequence of treating a cloning vector, before ligation, 
with calf intestinal phosphatase? 


. You have constructed four different libraries: a genomic 
library made from DNA isolated from human brain tissue, 
a genomic library made from DNA isolated from human 
muscle tissue, a human brain cDNA library, and a human 
muscle cDNA library. 


a. Which of these would have the greatest diversity of 
sequences? 

b. Would the sequences contained in each library be ex- 
pected to overlap completely, partially, or not at all with 
the sequences present in another of the libraries? 


. Using the genomic libraries in Problem 4, you wish to 
clone the human gene encoding myostatin, which is ex- 
pressed only in muscle cells. 


a. Assuming the human genome is 3 X 10° bp and that the 
average insert size in the genomic libraries is 100 kb, 
how frequently will a clone representing myostatin be 
found in the genomic library made from muscle? 

b. How frequently will a clone representing myostatin be 
found in the genomic library made from brain? 

c. How frequently will a clone representing myostatin be 
found in the cDNA library made from muscle? 

d. How frequently will a clone representing myostatin be 
found in the cDNA library made from brain? 


The human genome is 3 X 10? bp. You wish to design a 
primer to amplify a specific gene in the genome. In gen- 
eral, what length of oligonucleotide would be sufficient to 
amplify a single unique sequence? To simplify your calcula- 
tion, assume that all bases occur with an equal frequency. 


. Using animal models of human diseases can lead to in- 
sights into the cellular and genetic bases of the diseases. 
Duchenne muscular dystrophy (DMD) is the consequence 
of an X-linked recessive allele. 

a. How would you make a mouse model of DMD? 

b. How would you make a Drosophila model of DMD? 


Application and Integration 


15. The bacteriophage lambda genome can exist in either a lin- 


ear form (see Figures 17.1 and 17.8) or a circular form. The 

circular form occurs when the 20-bp cos sites (cohesive 

ends) anneal at their complementary base pairs and are 
ligated. 

a. How many fragments will be formed by restriction 
enzyme digestion with Xhol, with Xbal, and with both 
Xhol and Xbal in the linear and circular forms of the 
lambda genome? 

b. Diagram the resulting fragments as they would appear 
on an agarose gel after electrophoresis. 


10. 


11. 


12. 


13. 


14. 
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Compare methods for constructing homologous recombi- 
nant transgenic mice and yeast. 


Chimeric gene-fusion products can be used for medical 

or industrial purposes. One idea is to produce biological 
therapeutics for human medical use in animals from which 
the products can be easily harvested—in the milk of sheep 
or cattle, for example. Outline how you would produce 
human insulin in the milk of sheep. 


Why are diseases of the blood more likely targets for 
treatment by gene therapy than are many other genetic 
diseases? 


Injection of double-stranded RNA can lead to gene 
silencing by degradation of RNA molecules comple- 
mentary to either strand of the dsRNA. Could RNAi 
(see Sections 15.3 and 16.3) be used in gene therapy for 
a defect caused by a recessive allele? A dominant allele? 
If so, what might be the major obstacle to using RNAi as 
a therapeutic agent? 


Compare and contrast methods for making transgenic 
plants and transgenic Drosophila. 


It is often desirable to insert cDNAs into a cloning 
vector in such a way that all the cDNA clones will 

have their 3’ end in one orientation in the plasmid 

and their 5’ end in the other orientation. This is re- 
ferred to as directional cloning. Outline how you would 
directionally clone a cDNA library in the plasmid vector 
pUC18. 


A major advance in the 1980s was the development 

of technology to synthesize short oligonucleotides. 
This work both facilitated DNA sequencing and led 

to the advent of the development of PCR. Recently, 
rapid advances have occurred in the technology to 
chemically synthesize DNA, and sequences up to 10 kb 
are now readily produced. As this process becomes 
more economical, how will it affect the gene-cloning 
approaches outlined in this chapter? In other words, 
what types of techniques does this new technology have 
potential to supplant, and what techniques will not be 
affected by it? 


For answers to selected even-numbered problems, see Appendix: Answers. 


16. 


The restriction enzymes Xhol and Sall cut their specific 
sequences as shown below: 


XhoI 5 aC TCGAG-3' 
3'-GAGCT Gab 

SalI 5'-G TCGAC-3' 
3'-CAGCT G-5' 


Can the sticky ends created by Xhol and Sall sites be li- 
gated? If yes, can the resulting sequences be cleaved by 
either Xhol or Sall? 
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17. The bacteriophage 6X174 has a single-stranded DNA 
genome of 5386 bases. During DNA replication, double- 
stranded forms of the genome are generated. In an ef- 
fort to create a restriction map of 6X174, you digest the 
double-stranded form of the genome with several restric- 
tion enzymes and obtain the following results. Draw a map 
of the 6X174 genome. 


PstI 5386 PstI + Psil 3078, 2308 
Psil 5386 PstI + Dral 331, 1079, 3976 
Dral 4307, 1079  Psil+ Dral 898, 1079, 3409 


18. You have identified a 0.80-kb cDNA clone that contains 
the entire coding sequence of the Arabidopsis gene CRABS 


ori 


ks lacZ 
2961 bp 


Amp 


T7 sequencing primer 


5'G TAA AAC GAC GGC CAG TGA ATT G 
3'C ATT TTG CTG CCG GTC ACT TAA C| 


Notl Xbal BamHI amal 
| | l 


CLAW. In the construction of the cDNA library, linkers 
with EcoRI sites were added to each end of the cDNA, and 
the cDNA was cloned into the EcoRI site of the MCS of the 
vector shown below. You perform digests on the CRABS 
CLAW cDNA clone with restriction enzymes and obtain 
the following results. Can you determine the orientation 
of the cDNA clone with respect to the restriction enzyme 
sites in the vector? The enzymes listed in the dark blue 
region are found only in the MCS of the vector. 


EcoRI 0.8, 3.0 
HindIII 0.3, 3.5 
EcoRI + HindIII 0.3, 0.5, 3.0 


EcoRI Hindill Sall  Xhol 
| 


CGC GCT TGG CGT AAT CAT GGT CAT AGC TGT TTC CTG 3’ 
GCG CGA ACC GCA TTA GTA CCA GTA TCG ACA AAG GAC 5’ 


T3 sequencing primer 


19. You have isolated a genomic clone with an EcoRI 
fragment of 11 kb that encompasses the CRABS 

CLAW gene (see Problem 18). You digest the genomic 

clone with HindIII and note that the 11-kb EcoRI 

fragment is split into three fragments of 9 kb, 1.5 kb, 
and 0.5 kb. 

a. Does this tell you anything about where the CRABS 
CLAW gene is located within the 11-kb genomic 
clone? 

b. Restriction enzyme sites within a cDNA clone are 
often also in the genomic sequence. Can you think 
of a reason why occasionally this is not the case? 
What about the converse: Are restriction enzyme 
sites in a genomic clone always in a cDNA clone of the 
same gene? 


20. To further analyze the CRABS CLAW gene (see Problems 
18 and 19), you create a map of the genomic clone. The 
11-kb EcoRI fragment is cloned into the EcoRI site of the 
MCS of the vector shown in Problem 18. 


You digest the double-stranded form of the genome 

with several restriction enzymes and obtain the follow- 
ing results. Draw, as far as possible, a map of the genomic 
clone of CRABS CLAW. 


EcoRI 11.0, 3.0 

EcoRI + Xbal 4.5, 6.5, 3.0 Xbal 4.5, 9.5 
EcoRI + Xhol 10.2, 3.0, 0.8 Xhol 13.2, 0.8 
EcoRI + Sall 6.0, 5.0, 3.0 Sall 6.0, 8.0 
EcoRI + HindIII 9.0, 3.0, 1.5,0.5 HindIII 12.0, 1.5,0.5 


What restriction digest would help resolve any ambiguity 
in the map? 


. You have isolated another cDNA clone of the CRABS 


CLAW gene from a cDNA library constructed in the vector 
shown in Problem 18. The cDNA was directionally cloned 
using the EcoRI and Xhol sites. You sequence the recombi- 
nant plasmid using primers complementary to the T7 and 
T3 promoter sites flanking the MCS (the positions of these 


sequences are shown in the figure in Problem 18). The first 

30 to 60 bases of sequence are usually discarded since they 

tend to contain errors. 

a. Which sequence represents the 5’ end of the gene? 
Which sequence represents the 3’ end of the gene? 

b. Will the long stretch of T residues in the T3 sequence 
exist in the genomic sequence of the gene? 


Sequence produced 
with T7 primer 


AC 
70 


AGTGGATCCCCC GGGCT GCAGGAAT TCGGCACGAGTTCAAGAGCGGTTTTCAATC CAT 
90 100 110 120 130 


AAAGACC ATGAACCTAG AAG AG AAACC AACCATG ACGGNTTCAAGGGCTTCCCC TCA 
140 150 160 170 180 190 


Sequence produced 
with T3 primer 


CCCCCCTCGAGTTTTTTTTTTTTTTTTTTTTAAGGAAT ACGCAT AT AAAATTTNGATAGGATTA 
40 


100 


22. 


23. 


N 


AG ACAAAT AAAGACCAG ACAT AAACGTCCAAAGGG ACATAGCAAGTG ACGTTACT 
110 


TCAANTCT 


120 130 140 160 


L 


c. Can you identify which sequences are derived from the 
vector (specifically the MCS) and which sequences are 
derived from the cDNA clone? 

d. Can you identify the start of the coding region in the 5’ 
end of the gene? What does the sequence preceding the 
start codon represent? 


You have identified five genes in S. cerevisiae that are induced 

when the yeast are grown in a high-salt (NaCl) medium. To 

study the potential roles of these genes in acclimation to 

growth in high-salt conditions, you wish to examine the phe- 

notypes of loss- and gain-of-function alleles of each. 

a. How will you do this? 

b. How would your answer differ if you were working with 
tomato plants instead of yeast? 


You have generated three transgenic lines of maize that are 
resistant to the European corn borer, a significant pest in 
many regions of the world. The transgenic lines (T, in the 
accompanying table) were created using Agrobacterium- 
mediated transformation with a T-DNA having two genes, 


24. 


25; 


26. 


27. 
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the first being a gene conferring resistance to the corn 
borer and the second being a gene conferring resistance to 
a herbicide that you used as a selectable marker to obtain 
your transgenic plants. You crossed each of the lines to a 
wild-type maize plant and also generated a T population 
by self-fertilization of the Tı plant. The following segrega- 
tion results were observed (herbicide resistant:herbicide 
sensitive): 


Cross Line 1 Line2 Line 3 
Transgenic (T;) X wild type 1:1 3:1 51 
Self-cross (T2) Bul 15:1 351 


Explain these segregation ratios. 


Bacterial Pseudomonas species often possess plasmids 
encoding genes involved in the catabolism of organic com- 
pounds. You have discovered a strain that can metabolize 
crude oil and wish to identify the gene(s) responsible. 
Outline an experimental protocol to find the gene or genes 
required for crude oil metabolism. 


Two complaints about some transgenic plants presently 
in commercial use are that (1) the Bt toxin gene is consti- 
tutively expressed in them, leading to fears that selection 
pressures will cause insects to evolve resistance to the 
toxin, and (2) a selectable marker gene, for example con- 
ferring kanamycin resistance, remains in the plant, lead- 
ing to concerns about increased antibiotic resistance in 
organisms in the wild. How would you generate transgenic 
plants that produce Bt only in response to being fed upon 
by insects and without the selectable marker? 


In Drosophila, loss-of-function Ultrabithorax mutations 
result in the posterior thoracic segments differentiating 
into body parts with an identity normally found in the 
anterior thoracic segments. When the Ultrabithorax gene 
was cloned, it was shown to encode a transcription fac- 
tor and to be expressed only in the posterior region of the 
thorax. Thus, Ultrabithorax acts to specify the identity of 
the posterior thoracic segments. Similar genes were soon 
discovered in other animals, including mice and men. You 
have found that mice possess two closely related genes, 
Hoxa7 and Hoxb7, which are orthologous to Ultrabithorax. 
You wish to know whether the two mouse genes act to 
specify the identity of body segments in mice. 
a. How will you determine where and when the mouse 
genes are expressed? 
b. How will you create loss-of-function alleles of the 
mouse genes? 
c. How will you determine whether the mouse genes have 
redundant functions? 


You have identified an enhancer trap line (see Figure 16.19) 

generated by P element transposition in Drosophila in 

which the marker gene from the enhancer trap is specifi- 

cally expressed in the wing imaginal disc. 

a. How can you identify the gene adjacent to the insertion 
site of the enhancer trap? 

b. How would you show that the expression pattern of the 
enhancer trap line reflects the endogenous gene expres- 
sion pattern of the adjacent gene? 
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28. The highlighted sequence shown below is the one origi- 
nally used to produce the B chain of human insulin in 
E. coli. The sequence of the human gene encoding the 
B chain of insulin was later determined from a cDNA 
isolated from a human pancreatic cDNA library and is 
shown below without highlighting. Explain the differences 
between the two sequences. 


ATGTTCGTCAATCAGCACCTTTGTGGTTCTCACCTCGTTGAAGC 
TITGTACCTTGTTTGCGGTGAACGTGGTTTCTTCTACACTCCT 
AAGACTTAA 
GCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGC 
TCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCC 
AAGACCCGC 


30. The RAS gene encodes a signaling protein that hydrolyzes 
GTP to GDP. When bound by GDP, the RAS protein is in- 
active, whereas when bound by GTP, RAS protein activates 
a target protein, resulting in stimulation of cells to actively 
grow and divide. A single base-pair mutation (see below) re- 
sults in a mutant protein that is constitutively active, leading 
to continual promotion of cell proliferation. Such mutations 
play a role in the formation of cancer. You have cloned the 
wild-type version of the mouse RAS gene and wish to create 
a mutant form to study its biological activity in vitro and in 
transgenic mice. Outline how you would proceed. 


Gly Ala Gly Gly Val Gly 
Wild-type RAS DNA: 5’...GGC GCC GGC GGT GTG GGC...3’ 


Mutant RAS DNA: 


31. 


32. 
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Vitamin E is the name for a set of chemically related 
tocopherols, which are lipid-soluble compounds with 
antioxidant properties. Such antioxidants protect cells 
against the effects of free radicals created as by-products 
of energy metabolism in the mitochondrion. Different 
tocopherols have different biological activities due to 
differences in their retention by binding to gut proteins 
during digestion. The one retained at the highest level is 
a-tocopherol, while y-tocopherol is retained at less than 
10% of that efficiency. In Arabidopsis, a-tocopherol is the 
most abundant tocopherol in leaves, while y-tocopherol 
is the most abundant in seeds. An enzyme encoded by the 
VTE4 gene can convert y-tocopherol to a-tocopherol. How 
would you create an Arabidopsis plant that produces high 
levels of a-tocopherol in the seeds? 


You have cloned a gene for an enzyme that degrades lipids 
in a bacterium that normally lives in cold temperatures. 
You wish to transfer this gene into E. coli to produce 
industrial amounts of enzyme for use in laundry detergent. 


a. How would you accomplish this? 

b. You have managed to produce transgenic E. coli 
expressing mRNA of your gene, but only a low level of 
protein is produced. Why might this be so? How could 
you overcome this problem? 


Genomics: Genetics from a 
Whole-Genome Perspective 


CHAPTER OUTLINE 


18.1 Structural Genomics Provides a 
Catalog of Genes in a Genome 

18.2 Annotation Ascribes Biological 
Function to DNA Sequences 

18.3 Evolutionary Genomics Traces 
the History of Genomes 

18.4 Functional Genomics Aids in 
Elucidating Gene Function 


Sequences of entire genomes of many species from Charles Darwin's 
“tangled bank” have clarified evolutionary relationships of life on Earth and 
provided the genetic blueprints of genes that define organisms, though 
the precise functions of most genes are presently unknown. 


(5 enomics, the scientific study of biological processes 
from the perspective of the whole genome, originated 
in the Human Genome Project (HGP). This audacious project 
was initiated in the 1980s to sequence and analyze the human 
genome. At the time, neither the technologies for generating 
large amounts of DNA sequence nor the computing power 

to analyze such large amounts of data existed. 

Although a primary goal of the HGP was to sequence 
the human genome, several model genetic organisms were 
also sequenced under its auspices, including those that have 
appeared most often in the pages of this book: Escherichia coli, 
Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila 


612 CHAPTER 18 Genomics: Genetics from a Whole-Genome Perspective 


melanogaster, Arabidopsis thaliana, and Mus musculus. 


The genome sequences of these model organisms 
have contributed to our understanding of the 
organisms themselves as well as to interpretations of 
human genome structure, function, and evolution. 
Since then, the genomes of thousands of other pro- 
karyotes and hundreds of other eukaryotes have also 
been sequenced. Due to ever-decreasing costs and 
ever-improving technologies, genome sequencing 

is becoming increasingly affordable and routine. It is 
proving so useful that, in the future, species may be 


defined by characteristics of their genomic sequence. 


In the initial analyses of the genomes of model 
organisms, two findings stand out. First, even in 
well-studied organisms, only a fraction of genes 
identified by genome sequencing had been 
previously identified by forward genetic analyses; this 
brings up the question of the function of all these 
previously unknown genes. Second, genomic analy- 
ses have also revealed the highly dynamic nature of 
genomes, providing insights into the extent of differ- 
ences between species and between individuals of a 


species and the rates at which DNA sequences evolve. 


This chapter provides an overview of genomics by 
describing three of its major subdivisions. Structural 
genomics is concerned with the sequencing of 
whole genomes and the cataloging, or annotation, of 
sequences within a given genome. It provides a parts 
list of the genetic tool kit of an organism. Evolutionary 
genomics is the comparison of genomes, both within 
and between species. It illuminates the genetic bases 
of similarities and differences between individuals 
or species. Functional genomics uses genomic 
sequences to understand gene function in an organ- 
ism. Together, these three approaches contribute to 
the ultimate goal of understanding the role of every 
gene a given genome contains. 


18.1 Structural Genomics Provides 
a Catalog of Genes in a Genome 


Genomes vary enormously in size, from several hundred 
kilobases in some bacterial species to several thousand 
megabases in some vertebrate and plant species (Table 18.1). 
Genomes may consist of a single DNA molecule, as in 


Table 18.1 Examples of Sequenced Genomes 


Organism Description 


Escherichia coli Single-celled eubacterium 


Agrobacterium tumefaciens Single-celled eubacterium 


Rickettsia prowazekii Parasitic eubacterium 


Aeropyrum pernix 
Chlamydomonas reinhardtii 
Arabidopsis thaliana 

i Oryza sativa 
Saccharomyces cerevisiae 
Neurospora crassa 
Caenorhabditis elegans 
Drosophila melanogaster 
Takifugu rubripes 
Ornithorhynchus anatinus 
Mus musculus 
Pan troglodytes 


Homo sapiens 


7 Genome sizes given for most multicellular eukaryotes are estimates because sequences of the heterochromatic regions of the genomes are not known. 


Single-celled archaebacterium 
Single-celled chlorophyte alga 
Multicellular flowering plant 
Multicellular flowering plant (rice) 
Single-celled fungus (baker's yeast) 
Multicellular fungus (bread mold) 
Multicellular nematode worm 
Multicellular insect (fruit fly) 
Multicellular fish (puffer fish) 
Multicellular monotreme (platypus) 
Multicellular mammal (mouse) 
Multicellular mammal (chimpanzee) 


Multicellular mammal (human) 


Genome Predicted Number Predicted 
Size (Mb)? of Genes? Genes/Mb 
4.64 4200 905 
5.67 5419 956 
1.11 834 751 
1.67 2694 1631 
112 16,709 149 
136 27,249 200 
427 40,745 95 
12.2 6607 542 
41 10,357 253 
103 20,532 199 
169 13,937 82 
393 19,226 49 
2073 21,698 10 
2731 23,139 8.5 
2996 18,759 6.3 
3101 20,769 6.7 


© Gene number estimates are based on current annotations and could change with new experimental evidence. 
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many bacterial and archaeal species, or of hundreds of 
chromosomes, as in some eukaryotic species. From a broad 
perspective, gene number generally increases with organis- 
mal complexity. However, genomes also vary in their pro- 
portions of coding versus noncoding DNA sequences, and 
in multicellular eukaryotes, genome size can increase much 
more than gene number due to a disproportionate increase 
in noncoding DNA. 

These differences aside, even the smallest bacterial 
genomes are thousands of times longer than the 600 to 
900 bp that can be sequenced in a traditional single dide- 
oxy sequencing reaction (see Chapter 7). It is clear that to 
sequence any genome would require many such reactions. 
For example, the human genome, with its 3 X 10? bp, 
would require at least 5 X 10° reactions. Were technicians 
to run these reactions sequentially, designing a new primer 
for each reaction based on the sequence obtained from the 
previous one, it would take decades to sequence the entire 
genome. 

What, then, is an efficient way to sequence DNA 
molecules (i.e., chromosomes) millions of bases in length? 
The answer is to break them into smaller fragments and 
sequence the fragments in parallel. Computer algorithms 
are then used to assemble the sequences of the fragments 
into a single contiguous sequence. Two basic approaches 
to this general mode of attack differ only in the starting 
DNA to be fragmented and sequenced. In one approach, 
called whole-genome shotgun (WGS) sequencing, DNA 
representing the entire genome is fragmented into smaller 
pieces, and a large number of fragments are chosen at ran- 
dom and sequenced. In the second approach, often called 
clone-by-clone sequencing, each chromosome is first 
broken into overlapping clones that are then arranged 
in linear order to produce a physical map of the genome. 
Each clone in the map is then sequenced separately. 

The WGS approach is applicable to any genome and 
is the approach in widespread use today. The clone-by- 
clone approach, which has been largely supplanted by the 
WGS approach, relies on the availability of specific genetic 
resources and thus is applicable only to some model organ- 
isms. We describe both approaches here because they both 
played a role in the sequencing of the human genome. 


The Clone-by-Clone Sequencing Approach 


The clone-by-clone approach begins with construction 
of a physical map. Genetic maps provide a convenient 
foundation for this construction. For this reason, clone- 
by-clone sequencing is usually applied only to species that 
have a history of genetic analysis and thus for which tools 
such as genetic maps are available. The physical map of a 
genome is a set of overlapping genomic clones assembled 
into contigs that, once assembled, cover the entire gnome 
(see Chapter 17 for a description of contigs). The genome 
sequence is then determined by shotgun sequencing each 
of the clones (see Figure 17.13) and then combining those 
sequences into larger contiguous sequences based on the 


physical map. The completeness of the resulting genome 
sequence depends on the quality and completeness of the 
sets of overlapping clones. 

Comparison of the genetic map with the physical 
map can provide information that helps align the physical 
map with the known chromosomes of the organism. The 
genome sequencing of S. cerevisiae, C. elegans, A. thaliana, 
and, to some extent, humans relied on this use of physi- 
cal maps. For these species, a direct correspondence has 
been drawn between the genome sequence and the chro- 
mosomes, each of which is represented by either a single 
contiguous sequence or a small number of contiguous se- 
quences with gaps at the centromeres and at other highly 
repetitive regions. The clone-by-clone approach is not 
typically used anymore, due both to its high cost and to 
advances in WGS sequencing. 


Whole-Genome Shotgun Sequencing 


The whole-genome shotgun (WGS) approach sequences 
genomic DNA by the shotgun method without prior con- 
struction of a physical map. For this reason, WGS can 
be applied to any genome. In WGS sequencing, genomic 
DNA is broken into fragments and sequenced, and the 
sequences are assembled into contigs based on sequence 
overlaps (see Figure 17.13 for a diagram summarizing 
shotgun sequencing). To ensure enough overlapping of 
sequences for this purpose, technicians commonly gener- 
ate sequence totaling approximately 30 to 40 times the 
actual length of the genome (this degree of overlap is called 
30-40 X coverage); thus, any one sequence is contained 
in multiple reads, minimizing the chance of sequencing 
errors. The ease with which sequences are assembled into 
contigs depends on the lengths of the sequencing reads, 
and these vary between technologies (see Chapter 7). 

Repetitive DNA presents an obstacle in the assembly of 
WGS sequencing data. Dispersed repetitive DNA sequences 
(for example, transposons and retrotranposons) interfere 
with genome assembly, as explained in Figure 18.1, because 
they can map to multiple locations within the genome. 
Consequently, the assembled sequence often remains bro- 
ken at repetitive sequences. One way of circumventing 
this problem is to use paired-end sequence data to bridge 
the gaps left in the assembly because of repetitive DNA 
sequences. In paired-end sequencing, sequence is gener- 
ated from both ends of genomic DNA fragments of known 
size. When paired-end sequences flank a repetitive element, 
they can be used in assembling scaffold, a set of contigs 
that are physically linked by paired-end sequences and that 
contain the repetitive element. The relative orientations of 
paired-end sequences and their distance from one another 
can be incorporated into assembly algorithms. 

Let’s examine how scaffold assembly works. Typically, 
several genomic libraries, each containing cloned DNA 
fragments of a different size, are generated (Figure 18.2)— 
for example, one library of 2- to 3-kb clones, a sec- 
ond of 6- to 8-kb clones, and a third of larger clones 
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Sequences: 


Unique Repeat Unique Repeat Unique Repeat 
ie a a 


Á = es 
@ Fragment DNA and 
shotgun sequence. 


itt) | mpm 


(2) Identify overlapping sequences 
and assemble into contigs. 


A 


Repeat —_——. Repeat 
1 (a 2 


Since these sequences are identical, they cannot be assigned 
to a unique genomic location; thus, the relative locations and 
orientations of the A, B, and C contigs cannot be determined. 


Some possible assemblies: 


A B C 
N = SSS 
2 B A 
T LOLLL ee m 
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eee SS ee Lo m n 
B C A 
m a a a m a a 

a C A 
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Figure 18.1 The problem of repetitive DNA. 


(20 to 30 or more kilobases). Paired-end sequence gen- 
erated from clones in the different libraries will provide 
information on whether two particular sequences are 
physically linked and the approximate distance between 
the two sequences. Even if repetitive DNA occurs be- 
tween the paired-end sequences, they can still be linked 
into a scaffold. Dispersed repetitive DNA in the genome 
often consists either of simple, short repeats (microsatel- 
lites or minisatellites) or transposable element sequences 
(up to 10,000 bp). Most repeat sequences will be flanked 
by paired-end sequence from at least one of the differently 
sized libraries. However, repetitive sequences longer than 
the largest available clones (for example, centromeric 
repeat sequences, in many eukaryotes) cannot be spanned 
using this approach and thus cause gaps between contigs. 


WGS Sequencing of a Bacterial Genome For an idea of 
how the WGS approach works in practice, let’s consider 
two examples, a small bacterial genome with little 
repetitive DNA and a large eukaryotic genome containing 
a significant proportion of repetitive DNA. 

The first genome to be sequenced by a paired-end 
WGS approach (at The Institute for Genomic Research, 
or TIGR, in 1995) was that of Haemophilus influenzae, a 
Gram-negative bacterium whose natural host is humans 
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Figure 18.2 Paired-end shotgun sequencing strategy. 


(Figure 18.3). The H. influenzae genome is 1.8 X 10° bp and 
has relatively few dispersed repetitive elements. Paired-end 
sequence was generated from three genomic libraries. These 
sequence data were assembled into 140 contigs whose rel- 
ative orders and orientations were unknown. Since the 
H. influenzae genome is a single circular chromosome, 
the assembled sequence had 140 gaps for which sequence 
information was lacking. However, with information on 
the physical linkage of paired-end reads, the gaps could be 
divided into two categories: 98 were sequence gaps within 
a scaffold, meaning gaps for which a clone was available for 
further sequencing that could close the gap, and 42 were 
physical gaps between scaffolds, meaning gaps for which 
there was no clone to supply the sequence. 
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(a) Strategy employed in the whole-genome 
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Figure 18.3 Whole-genome shotgun sequencing of the Haemophilus influenzae genome. 


Sequence gaps were closed by sequencing of spanning 
clones identified through paired-end sequencing. Two 
approaches were used to close the physical gaps. First, the 
lambda genomic libraries were probed with sequences 
derived from the ends of the scaffolds: If a single genomic 
clone hybridized with ends of two scaffolds, the clone 
should span the gap between the two scaffolds. Second, 
using combinations of primers specific to sequences 
at the ends of scaffolds, polymerase chain reaction 
(PCR) methodology was employed to amplify spanning 
sequences. With this combination of approaches, the 
entire 1,830,137-bp sequence of the H. influenzae genome 
was assembled into a single contig. 


WGS Sequencing of a Eukaryotic Genome The genome 
of Drosophila was the first large eukaryotic genome 
containing a significant fraction of repetitive DNA to be 
sequenced using a WGS approach. The Drosophila genome 
is approximately 170 Mb, of which 120 Mb is considered to 
be euchromatic and the remaining 50 Mb heterochromatic. 
Because centromeric heterochromatic DNA is not efficiently 
cloned, owing to its highly repetitive nature, only the 
euchromatic portion of the genome was initially sequenced, 
using the Sanger sequencing method (see Section 7.5). 
Paired-end sequencing was accomplished using three 
genomic libraries of 2 kb, 10 kb, and 130 kb (Figure 18.4). The 
10-kb clones were large enough to span most of the dispersed 
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Figure 18.4 Whole-genome shotgun sequencing of the 
Drosophila melanogaster genome. 


repetitive elements (such as transposons and retrotrans- 
posons) found in the Drosophila genome, while the 130-kb 
clones provided long-range linking information from which 
to infer overall structure in the sequence assembly. Most of 
the 12 X-coverage sequence generated could be assembled 
into 50 scaffolds representing almost 115 Mb of the euchro- 
matic portion of the genome. The remaining sequence was 
assembled into almost 800 additional scaffolds represent- 
ing about 5 Mb; thus, the assembled Drosophila genome 
sequence had several hundred physical gaps. Genetic and 
physical maps of Drosophila were used to assign the 50 large 
scaffolds and an additional 84 scaffolds to specific regions 
of the four chromosomes, corresponding to most of the 
euchromatic regions of the chromosome arms. 

The WGS sequencing of the Drosophila genome 
benefited from the genetic resources that Drosophila 
geneticists had constructed throughout the 20th century, 
such as genetic maps of morphological and molecular 
markers. These tools allowed sequences to be assigned 
to specific chromosomal locations. They also provided 
a benchmark for assessing the completeness of the 
assembled sequence: Of the 2783 previously known genes 
of Drosophila, 2778 could be found in the scaffolds, thus ac- 
counting for an estimated 97.5% of the euchromatic DNA. 


Subsequently, next-generation sequencing technologies 
(see Chapter 7) have been used to sequence the Drosophila 
genome at greater depth, leading to more complete cov- 
erage. The most up-to-date assembly of the Drosophila 
genome can be found at j e.org 


The Human Genome The U.S. Human Genome Project 
began officially in 1990 with a projected time scale of 
15 years and a budget of $3 billion. This government- 
funded project took a clone-by-clone approach to 
sequencing the human genome; therefore, it started by 
developing tools to build a physical map. In 1998, however, 
the newly founded Celera Corporation announced that it 
would provide a human genome sequence in just 3 years 
by using a WGS sequencing approach. Competition from 
this private company increased the pace of the publicly 
funded project, so that the genome sequencing was 
completed 4 years ahead of schedule. 

In 2000, then-President Bill Clinton, appearing at a 
press conference with J. Craig Venter (president of Celera) 
and Francis Collins (director of the Human Genome 
Sequencing Consortium), announced the completion of 
a “draft” of the human genome sequence. In fact, there 
were two draft sequences—one furnished by the HGP 
clone-by-clone approach and one by the Celera WGS 
approach—and both had numerous gaps. In subsequent 
years, a “complete” sequence of the human genome has 
been generated by targeted sequencing of specific regions 
of the genome to connect adjacent contigs and ensure that 
the error rate is less than 1/10,000. The gaps between the 
scaffolds and contigs were closed by the same approaches 
described earlier for the H. influenzae and Drosophila 
genomes, resulting in a genomic sequence consisting of 
approximately one contig for each chromosome arm. 


The Future Rapid technological advances are continually 
changing how genomes are sequenced. Nearly all genome 
projects today employ WGS sequencing using next- 
generation sequencing technologies (see Chapter 7). For most 
organisms whose genomes are being sequenced, researchers 
do not have extensive genetic maps, mutant collections, 
or other genetic resources. Thus, the completeness of the 
genome sequences is not as easy to assess as it was for 
the model organisms listed inside the back cover, nor is 
the assignment of sequences to specific chromosomes 
straightforward. However, the ease with which genomes 
can now be sequenced, coupled with advances in forward 
and reverse genetic technologies (see Chapter 16), makes it 
feasible to develop almost any organism for which there is an 
interesting biological question into a genetic model. 


Metagenomics 


In both the number of individual organisms and their total 
mass, microbial populations constitute the majority of life 
on Earth. However, unlike model genetic organisms, which 
are convenient for scientists to study, only a small fraction 
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of microbes can be cultivated in the laboratory. How can 
we begin to understand microbial diversity without being 
able to grow the necessary range of microorganisms in the 
lab? One approach is to apply WGS sequencing to DNA 
isolated from entire natural communities consisting of a 
range of organisms. The data derived from such sequenc- 
ing projects are called a metagenome. 

One of the first metagenomics projects provides 
an example. It was an environmental genomic shotgun 
sequencing of DNA isolated from microorganisms from 
the Sargasso Sea, a region of ocean bounded by the Gulf 
Stream off the southeast coast of the United States. In this 
study, approximately 265 Mb of sequence was generated 
and assembled into a large number of contigs, represent- 
ing an estimated 1800 different genomes. However, none 
of the estimated 1800 genomes was complete, and many 
were represented by only one or a few contigs. This situation 
highlights a complication arising in metagenomic studies: 
Species in any given environmental sample are not equally 
represented, and so data from common species are over- 
weighted relative to those of scarcer ones. Consequently, any 
complete genome sequences that are produced are likely to 
belong to very common species while genomes of rare spe- 
cies are represented by only a small number of contigs. 

Despite such limitations, metagenomic analyses pro- 
vide information on species diversity and relative popula- 
tion levels in an environmental setting and also contribute 
to the identification of gene sequences of organisms living 
in a particular environment. Such analyses have been 
applied, for example, to ecological communities living 
in acidic mine tailings, contaminated groundwater, and 
drinking-water systems and also to more “natural” (less 
affected by humans) ecosystems such as soils, oceans, 
and hot springs. In addition, as described in Experimental 
Insight 18.1, metagenomic analyses of several microbial 
biomes of humans, including the gut, mouth, and skin, 
have revealed that, collectively, our microbial biomes 
possess more genes than our own genome. The same se- 
quencing strategy can be applied to any biological system 
from which purified DNA belonging to a single species is 
difficult to obtain. An application of metagenomics is pre- 
sented in the Case Study at the end of this chapter. 


18.2 Annotation Ascribes Biological 
Function to DNA Sequences 


The genome sequence can be considered the finest-scale 
physical map of the genome, and in it are encoded all the 
genes of the organism. Genome annotation identifies the 
location of genes and other functional sequences within 
the genome sequence. 

Annotation is the process of attaching biological func- 
tions to DNA sequences, and gene annotation describes 
the biochemical, cellular, and biological function of each 
gene product the genome encodes. Until annotated, a 


genome sequence is nothing but a very long string of As, 
Ts, Cs, and Gs. Annotation describes both structural and 
functional features of a gene. Its goal, moreover, is not only 
to identify known genes, regulatory sequences, and so on, 
but also to identify sequences that are likely to be genes 
though their function, if they are genes, is as yet unknown. 
Annotations may be based on experimental evidence—the 
gold standard—or on computational analysis, which then 
must be confirmed experimentally. 


Experimental Approaches to Structural Annotation 
Structural annotation aims to identify genes and their 
structural components, including transcribed, coding, 
and regulatory sequences. Experimental approaches to 
identifying transcribed sequences in a genome make use 
of cDNA. Comparison of cDNA sequences with genomic 
sequences identifies the parts of the genome that undergo 
transcription leading to production of RNA molecules (see 
Chapter 16 for a review of cDNA and genomic libraries). 

In theory, a complete set of cDNA clones representing 
all the genes from an organism would allow complete anno- 
tation of the transcribed regions of its genome. In practice, 
though, complete sets of cDNA clones are not available. 
This is due to both variability in expression levels and varia- 
tion in structure and processing of different transcripts (see 
Section 8.4 for discussion of mRNA splicing). Nevertheless, 
for many organisms, a large amount of cDNA sequence 
is available, allowing the partial or complete assembly of 
gene transcripts. Sequences are sometimes called expressed 
sequence tags (ESTs) when they do not cover the entire 
length of the gene. Comparing these transcribed sequences 
with the genomic sequence allows accurate annotation of 
gene exons and introns, including alternative splicing and 
other mRNA variants (Figure 18.5). 
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splice site consensus sequences at the ends of the introns. 


Figure 18.5 Experimentally acquired clues for gene 
annotation. 
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Experimental Insight 18.1 


Our Communities Within and Upon 


When we look in the mirror, we like to think we are looking at 
just ourselves, but the microbes within and upon us, primarily 
bacteria, outnumber our own cells by a margin of greater than 
10 to 1, though they comprise only about 1 kg of our weight. 
Perhaps the first to recognize that we are host to our own 
microbiome was Antonie van Leeuwenhoek, who, scraping 
“gritty matter” from between his teeth, observed the “animal- 
cules,” or bacteria, in his dental plaque in 1683. Subsequently, 
bacterial culturing techniques demonstrated that microbes 
inhabit many parts of our bodies, but as revealed by the 
application of metagenomic shotgun sequencing, only 
a small fraction of the microbial diversity was culturable. 
Metagenomics has since revolutionized our thinking, leading 
to the present view that each of us has our own private eco- 
systems, complete with diverse habitats and ecology. 


DIGESTIVE MICROBIOME 


The inner mucosal surfaces (gastrointestinal tract and 
mouth) and the skin are dominated by four phyla of bacteria: 
Actinobacteria, Firmicutes, Bacteroidetes, and Proteobacteria. 
It is becoming apparent that the makeup of our gut microbial 
community influences our health and well-being and it’s com- 
position is influenced by our diet. Metagenomic sequencing 
of the gut microbiomes from hundreds of individuals revealed 
that these microbiomes fall into three general types of bac- 
terial communities, or enterotypes. Enterotypes correspond 
strongly to long-term dietary habits. For example, high protein 
and animal fat consumption is correlated with the Bacteroides 
enterotype, and a high carbohydrate diet is correlated with a 
Prevotella enterotype, suggesting there is feedback between 
diet and habitat favoring growth of specific bacterial groups. 

A striking example of how diet can influence our resident 
microbes is the occurrence of a unique lateral gene transfer 
event in Japanese individuals who eat substantial amounts 
of red algae, the “wrapping” used in sushi. In this case, genes 
encoding enzymes that break down red algal polysaccharides 
have been transferred from bacteria that normally live on the 
red algae to Bacillus species resident in the human gut. Thus, 
the bacteria in people who consume quantities of red algae 
evolve to better utilize this food source. 

We obtain our initial gut microbiome from our mother’s 
birth canal and subsequently from her milk. Those born by 
caesarean section miss out on these potentially important 
contributions. Short-term changes in diet do not appear to 
induce changes in gut microbes, but major perturbations, 
such as antibiotic usage, can alter communities. Normally, 
the ecology of the microbial communities is robust, and they 
rebound to their former composition even after major insults. 


Computational Approaches to Structural Annotation 
The genomes of multicellular eukaryotes often contain 
tens of thousands of genes, for many of which little or no 
experimental data have been collected. In the absence of 
experimental data concerning the existence or function 
of a gene, computational approaches are used to identify 


However, sometimes new communities, often detrimental to 
the health of their host, take over, and these may be resistant 
to removal by antibiotics. A seemingly radical method of 
displacing these unwanted microbes, a fecal transplant from 
a healthy individual, appears to be highly efficient, suggest- 
ing other similar transplant approaches may be capable of 
replacing “bad” microbiota with a “good” version. Alterations 
of the gut microbiome have also been associated with several 
disease states, including Crohn's disease, colorectal cancer, 
and irritable bowel syndrome, highlighting the critical rela- 
tionship we share with our ecosystems. 


SKIN MICROBIOME 


Our skin offers about 1.8 m? of diverse habitats colonized by 
microbes. Despite our bathing and shedding of skin cells, 
our bacterial communities remain relatively constant and are 
dominated by the same four phyla as our guts, but with 
Actinobacteria more abundant. 

Three distinct skin habitats—moist, dry, and sebaceous— 
are created by variations in skin thickness, folds, and density 
of glands and hairs. The three habitat types are colonized by 
distinct bacterial communities, with greater similarity arising 
from similar habitat type than from topographic proximity. 
In transplant experiments where forehead (sebaceous) and 
forearm (dry) habitats were populated with tongue bac- 
teria, the tongue bacteria remained for some time at the 
forearm site but were quickly replaced by “native” bacteria 
on the forehead. This and temporal monitoring of bacterial 
communities indicate that the moist and sebaceous habitats 
have more stable communities than the dry skin areas. In con- 
trast, the dry skin areas, such as the forearm, heel, and buttock, 
that are more environmentally exposed may be colonized 
opportunistically by a broader range of bacteria. If we are born 
by the normal birth process, we acquire a coating of primarily 
Lactobacillus in our mother's birth canal. This is replaced by 
habitat-characteristic communities in the first years of our life. 

While it is not yet clear how many of our microbes are 
commensal, symbiotic, or pathogenic, it is becoming clear 
that they exert a significant influence on our health and well- 
being. In particular, the proper development of our immune 
system, both when it is being established during infancy 
and later when protecting our internal mucosal system, is 
influenced by the composition of our microbiome. Finally, 
experiments manipulating the gut microbiomes of mice sug- 
gest that intestinal microbiota can influence brain chemistry 
and behavior. Thus, next time you look in the mirror, ponder 
the ecosystem you are cultivating and how its denizens are 
contributing to your life. 


possible genes within genome sequences. The use of 
computational approaches to decipher DNA-sequence 
information is termed bioinformatics. 

Bioinformatic annotation algorithms predict gene 
structure by identifying open reading frames (ORFs), 
sequences that appear to possibly code for polypeptides. 
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Most of these algorithms initially search for ORFs larger 
than a minimum size, such as 50 amino acids, since 
ORFs of at least that size are less likely to occur at 
random. Data derived from known cDNA sequences of 
the organism under analysis can be used to fine-tune the 
algorithms employed for gene annotation. Even so, pre- 
dictions are not infallible, especially in large eukaryotic 
genomes, where exons are often small relative to introns 
and are dispersed over large distances. Bioinformatic 
algorithms are generally less successful than experi- 
mental data in correctly predicting exons, but they can 
provide enough information to assist in the design of 
experimental approaches for clarifying gene structures. 
Most computational methods begin with a search for 
ORFs, which are useful for predicting protein-coding 
genes but do not help recognize genes that code for RNA 
molecules. Thus, experimental or comparative genomic 
(see Section 18.2) approaches are usually required for an- 
notating genes whose products are noncoding RNA. The 
process by which genes are predicted is explored further 
in Research Technique 18.1. 

Another bioinformatic method of gene annotation is 
to compare genome sequences of related species. As we 
discuss in a later section, this and other forms of compara- 
tive genomic analysis are becoming increasingly powerful 
as the genome sequences of more species become avail- 
able. After genes are predicted computationally, either 
from algorithms or phylogenetic comparisons, they must 
then be confirmed experimentally. 


Functional Gene Annotation In addition to pinpointing 
genes and their structural components, gene annotation 
also aims to describe biochemical and biological 
function. Let us consider the /acI gene, which encodes 
the lac repressor protein of E. coli. The biochemical 
function of the encoded protein is to bind to DNA 
and allolactose, and its cellular function is to regulate 
transcription of the lac operon (see Chapter 14). The 
biological function of the lacI gene is regulation of 
gene expression in response to sugar availability in the 
environment. In this case, the annotation we make can 
be quite detailed, since we know a great deal about the 
lacl gene. 

Genes that are similar to each other in sequence 
are assumed to encode gene products with similar 
biochemical functions. Genes similar in sequence to the 
lacI gene, for example, are likely to encode transcription 
factors that regulate gene expression. However, the na- 
ture of the genes they regulate may not be easy to predict. 
Initial annotation of the eukaryote genomes represented 
in Figure 18.6 categorized many genes by their presumed 
biochemical or cellular function. About half of the genes 
referred to in the figure have either a known biochemical 
and cellular functions, learned from previous experimen- 
tal evidence, or a presumed biochemical function based 
on sequence similarity to known proteins. Additional 


information for gene annotation may be derived from 
functional genomics experiments, such as those de- 
scribed in Section 18.3. While biochemical and cellular 
functions can sometimes be predicted, ascertainment 
of the biological functions of genes requires analyses of 
mutant phenotypes (see Chapter 16 for descriptions of 
approaches to mutant analysis). 


Related genes and protein motifs Examination and 
comparison of whole-genome sequences have allowed 
researchers to recognize gene families, groups of genes 
that are evolutionarily related. Some gene families are 
prominent in certain species, while others may be entirely 
absent. The 23,000 genes of the human genome can 
be placed in about 10,000 gene families. While most 
mammals largely share this set of gene families, only 
3000 to 4000 of these gene families are found throughout 
eukaryotes. Other lineages, such as fungi and plants, have 
their own sets of lineage-specific gene families. 

Expansion and retention of particular gene families 
depends on the importance of their biological functions 
to the organism. For example, in mammals, the gene 
family encoding olfactory receptors is often the largest 
in the genome, frequently consisting of more than 1000 
members. However, the olfactory receptor gene family is 
much larger in organisms that rely heavily on this sense 
(a mouse has more than 900 of these genes) than in spe- 
cies in which the sense of smell is diminished (humans 
have only 339). In humans, the largest gene family 
encodes proteins functioning in the immune system, but 
this family of genes is absent in both Arabidopsis and 
Saccharomyces, where the largest gene families encode 
protein kinases. 

Evolutionary relationships between genes may also 
be recognized through conserved protein domains rather 
than entire genes. Many eukaryotic proteins are modular, 
consisting of distinct protein domains joined together 
(Figure 18.7). Because many protein domains correlate 
with exon structure in genes—that is, one or more 
exons specifically encode a particular protein domain—a 
hypothesis has been advanced that composite genes (genes 
that encode multiple conserved protein domains) are gen- 
erated by exon shuffling (see Section 18.2), through dupli- 
cations, translocations, and inversions of DNA sequences. 
The modular structure of proteins means that the number 
of genes is much larger than the number of unique func- 
tional protein domains. Exon shuffling creates new genes 
with novel arrangements of protein domains that can be 
appropriated to fulfill new biological roles. The available 
data indicate that the protein repertoires of multicellular 
eukaryotes are generally more complex, averaging more 
different domains per protein, than those of single-celled 
eukaryotes. Knowledge of conserved protein domains 
often provides insight into potential biochemical activities 
of proteins, but, again, understanding the biological func- 
tion requires mutant analysis. 
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Figure 18.6 Genome annotation of (a) Arabidopsis thaliana 
predicted biological function. Genes 
are categorized with presumed functions 
based on similarity to known genes. 
When the Arabidopsis and Drosophila 
genomes were first annotated in 2000, 
many genes (blue) had no similarity to 
genes of known function. However, in 
the past decade significant progress has 
been made to functionally characterize 
these genes, either using functional or 
comparative approaches. 
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(b) Drosophila 
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Variation in Genome Organization among 
Species 


Having obtained and compared the genome sequences of 
bacteria and archaea and of eukaryotes (see Table 18.1), bi- 
ologists can draw several general conclusions about genome 
organization (Figure 18.8). First, bacteria and archaea have 
fewer genes and much higher gene density than eukaryotes. 
This high gene density is attributable to the lack of introns, 
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the more compact size of regulatory sequences, and the 
generally less complex structures of most encoded proteins 
in bacteria and archaea. Second, eukaryotes differ widely in 
both gene number and gene density, and the genomes of 
single-celled eukaryotes tend to encode fewer genes than 
those of multicellular eukaryotes. Third, species that have 
evolved to be obligate parasites often experience genome 
contraction. As parasites become dependent on their hosts 
for nutrients, they lose the genes they no longer need. 
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Figure 18.7 Modularity of protein domains. (a) Proteins 
are often modular, composed of discrete domains (e.g., Ep1, 
Ep2, PHD, Br, BMB, Znf). Complex proteins can evolve by mixing 
and matching of protein domains, usually through a process 
known as exon shuffling. (b) Multicellular eukaryotes have more 
complex protein architectures than single-celled eukaryotes. 


This trait is reflected in the reduced genome size compared 
to the other eubacteria of Rickettsia prowazekii, the eubac- 
terium responsible for typhus in humans (see Table 18.1). 


Just as gene number and density vary among eukaryotes, 
so does the proportion of repetitive DNA in the genome. 
The human genome consists of more than 50% repeti- 
tive DNA: Approximately 45% consists of transposable ele- 
ments (transposons, retrotransposons, and retroelements); 
a further 3% consists of microsatellite sequence; and about 
5% contains recent gene duplications. Additional repetitive 
DNA is present in the centromeric and telomeric sequences. 
The repetitive DNA that is not centromeric or telomeric is 
often called dispersed repetitive DNA because it is distrib- 
uted throughout the genome. The proportion of repetitive 
DNA in a genome is a significant factor influencing gene 
density. Some features of genome organization can be seen 
in human chromosome 21, shown in Figure 18.9. 

The annotated genome sequences of model genetic 
organisms can be found at the websites provided on the 
back endsheets of this book. The host site for the human 
genome (http://genome.ucsc.edu/) also acts as a portal to 
the annotated genomes of several additional species. 


Three Insights from Genome Sequences 


Analyses of genome sequences from a range of bacteria, 
archaea, and eukaryotes have produced many insights 
into the nature of genomes, of which three are particularly 
important. First, genomic comparisons demonstrate that 
the genomes of all organisms are highly dynamic in nature. 
Transposable elements (see Chapter 13) are just one of the 
factors driving genome evolution; large- and small-scale 
chromosomal duplications as well as deletions and other 
rearrangements also contribute. Substantial genetic varia- 
tion is seen even within species, thus providing raw mate- 
rial for natural selection and the evolution of new species. 
Second, genome sequencing of model organisms 
reveals the limitations of forward genetic screens. Even in 
intensely studied species, such as E. coli and S. cerevisiae, 
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Figure 18.9 Genome annotation of human chromosome 21. 


forward genetic screens (see Chapter 16) identified only a 
fraction (a third to half as many) of the genes identified by 
genome sequencing. What are the functions of all these 
previously unknown genes? 

The third insight obtained from the analysis of genomes 
is the discovery that the number of genes in the human ge- 
nome is comparable to that of various other multicellular 
eukaryotes. Over the past 25 to 30 years, the estimates of 
gene number in the human genome have steadily decreased. 
Having once estimated our genome to contain as many 
as 80,000 to 120,000 genes, we may find it humbling to 
discover that we and other animals have fewer genes than 
many plants. The estimated number of 20,000 to 25,000 
genes in the human genome is typical for vertebrates, and 
it is not much higher than the 14,000 or so estimated 
for Drosophila. If some of us have “gene number anxiety,” 
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it should be assuaged by recognizing that gene number 
does not translate directly into protein number or organ- 
ism complexity. Both exon shuffling and alternative splicing 
increase the complexity of proteins in eukaryotes, and these 
processes are much more prevalent in animals than in ei- 
ther fungi or plants. In the remaining pages of this chapter, 
we address these major insights in more detail. 


18.3 Evolutionary Genomics Traces 
the History of Genomes 
Evolutionary genomics, sometimes called phylogenomics or 


comparative genomics, is the comparative study of genomes. 
Interspecific comparisons of genomes—comparisons 


Research Technique 18.1 


Bioinformatics 


PURPOSE What do computer algorithms “look for” ina DNA 
sequence during annotation of a bacterial, archaeal, or eukary- 
otic genome? Often, the first step in annotation is the identifi- 
cation of open reading frames (ORFs). In bacteria and archaea, 
all ORFs that are translated into protein will have a start codon 
(ATG) and a stop codon (TAA, TAG, or TGA) with an uninter- 
rupted open reading frame lying between. In eukaryotes, 
however, where genes may be separated into multiple exons, 
only the amino-terminal exon has a start codon, and only the 
last-coding exon has a stop codon, but all internal exons have 
the sequences that ensure proper splicing, as do the 3’ end of 
the first exon and the 5’ end of the last exon. 


PROCEDURE Let's practice examining a nucleotide se- 
quence to see if we can identify sequences that might encode 


biological information. However, as this example illustrates, 
the identification of ORFs quickly becomes a computational 
problem more suited to computers than to pencil and paper. 
To simplify the analysis, we'll assume we are looking at DNA 
sequence from a bacterium so that we need not consider the 
requirements of exon-intron cutting and splicing. 

Since proteins can be encoded in either strand of the 
double-stranded DNA molecule, six reading frames must 
always be considered in searches for potential ORFs: three 
reading frames in the forward direction and three read- 
ing frames in the complementary strand in the reverse 
direction. Consider the first 21 nucleotides of the sequence 
below. 


S’'TTGCAGTATGGGCTAGACCAAAGAGAGAGTTGATAACTAGCCGAAACGAACCATGTTCGTCAATCAGCACCTTTGTGGTT 
CTCACCTCGTTGAAGCTTTGTACCTTGTTTGCGGTGAACGTGGTTTCTTCTACACTCCTAAGACTTAAGCTAGCTAAGTA 
TAGATGGCGAGGTGACACACACACACACAGGTAGATATTAA 3' 


(1) Identify the three reading frames (rf) in the forward direction and in the complementary strand. 


The three reading frames in the forward direction The three reading frames in the complementary strand 


rf15]/TTG CAG TAT GGG CTA GAC CAA |] 3 rf43°//7 
rf25"]/T rec AG? ATG GGC TAG ACC AAJ 3 rfs 3’ 
35° //TT GCA GTA TGG@ GCT AGA CCA A/[3' fo 3’ 


(2) Highlight all potential start codons (MiG); note that these can occur in any of the six reading frames. 
There are four potential start codons, highlighted under step (3) below: rf2-1 (reading frame 2, first potential start codon), 
rf2-2, rf2-3, and rf4-1. 
(3) Highlight any stop codons that are in the same reading frame as the four identified start codons. 
Since all potential start codons were in either reading frame 2 or 4, we need only look for potential stop codons in these reading frames. 
Six potential stop codons can be found in reading frame 2, and seven in reading frame 4. 
The forward direction 
rf2-1 rf2 rf2 rf2-2 
5'TTGCAG1AmaaGCPWaACCAAAGAGAGAGT 1 GATAAC IMac C GAAACGAACCAMmaT TCGTCAATCAGCACCTITIGIGGIT 
CTCACCTCGTTGAAGCTTITGTACCTIGITTGCGGTGAACGTGGTITCTICTACACTCCTAAGACTIMWAGCTAGCIBMWNGTA 


EACC GAG GBMEENC ACACACACACACAGGTAGATATTAA 3’ rf2 rf2 
rf2 rf2-3 rf2 


The reverse complementary sequence 


(4) Identify open reading frames and corresponding amino acid sequences. 


We find that the rf2-1, rf2-3, and rf4-1 potential start codons are followed almost 
immediately by in-frame stop codons, preventing the open reading frames from 
encoding more than 2, 3, or 5 amino acids. In contrast, the open reading frame 
commencing from rf2-2 is much longer. 


The rf2-2 start codon is followed by an open reading frame of 93 nucleotides that could encode a protein of 31 amino acids: 
5 TTGCAGT AMG G GC PMMGAC CAAAGAGAGAGTTGATAACIMMGc C GAAACGAACCAMMGT TCGTCAATCAGCACCTTTGlGGTT 


CTCACCTCGTTGAAGCTTIGIACCTIGITTGCGGIGAACGTGGIITICTICTACACTCCTAAGACTIBWAGCTAGCIBWNGTA 


te ee A YY A A YY A LY A YY A A oe A vam aa Ne $ NP mf YN He Aen Me wom 


EA GGCGAGGPRERIC ACACACACACACAGGTAGATATIAA 3’ 


For more practice with bioinformatics concepts, see Problems 4, 5, and 6. 
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between species—identify sequences conserved over evolu- 
tionary time and thus facilitate the annotation of genomes 
and provide insight into the evolution of genes and or- 
ganismal diversity. In contrast, intraspecific comparisons 
identify sequence polymorphisms that are responsible for 
the genetic differences within populations of a single species. 
These differences are the raw material of evolution and form 
the basis of population genetics and the evolution of species. 

The evolutionary history of each organism can 
be traced in its genome and in the composition of its 
chromosomes. Evolutionary genomics has revealed the 
striking fact that a large number of genes are shared by 
phylogenetically distant species, reaffirming that all life 
on Earth is related. Species that are more closely related 
to one another share a larger number of genes than spe- 
cies that are more distantly related. In closely related 
species, the similarities in sequence go beyond shared 
genes to conserved chromosomal segments. Evolutionary 
genomics has also brought to light important information 
concerning the highly dynamic nature of the genome. 
Changes, in the form of mutations, can be observed even 
in the time scale of a single generation. 


The Tree of Life 


The large amount of DNA sequence information now 
available has revolutionized how biologists perceive the 
tree of life, the phylogenetic tree depicting the evolution- 
ary relationships between organisms. Morphological and 
physiological traits were once the primary basis of species 
classification, but DNA sequence comparisons have pro- 
vided new clarity concerning questions that the older 
methods of study were unable to resolve. 

Comparisons of DNA sequences of the same gene from 
different species are particularly useful for assessing phyloge- 
netic relationships. Due to their ubiquity and high degree of 
conservation, genes encoding the ribosomal RNAs provide 
a universal sequence for such comparisons. By comparing 
ribosomal RNA sequences, Carl Woese and colleagues 
revealed through pioneering studies in the late 1970s that all 
forms of life on Earth fall into one or another of three distinct 
domains: Bacteria, Archaea, and Eukarya. Since then, rela- 
tionships within many eukaryotic groups have been clarified 
using DNA sequence comparisons, allowing the basic 
architecture of the tree of life to be determined (Figure 18.10). 
Some surprising relationships have emerged. For example, 
the fungi and metazoans, which had traditionally been con- 
sidered two separate “kingdoms” of life, were discovered 
to be relatively closely related and are now grouped with 
Amoebozoa in a clade called the Unikonts. Since animals and 
plants are the most conspicuous life-forms from a human 
perspective, the tree presented in Figure 18.10 is biased to- 
ward a focus on the interrelationships of those two groups. If 
all its branches were to be presented in equal detail, the “tree” 
would more closely resemble a very dense bush. 

The tree of life in Figure 18.10 was constructed 
using DNA sequence information (see Chapter 1) and 


comparison of the alignment of homologous nucleotides 
to ascertain phylogenetic relationships. Homologous 
nucleotides are those that are descended from the same 
nucleotide in the common ancestor of the two species 
being compared. Highly conserved protein-coding DNA 
sequences, some of which have been conserved over time 
scales of more than a billion years, are analyzed to identify 
ancient evolutionary branch points, or nodes. Conversely, 
rapidly evolving sequences are compared to clarify 
recent nodes in species evolution. Intron and intergenic 
sequences, on which there may be little selective pressure 
to maintain a specific sequence, can accumulate muta- 
tions and change rapidly over time. A strategy developed 
to search for homologous sequences, using a computer 
program called BLAST, for Basic Local Alignment 
Search Tool, is described in Research Technique 18.2. 


Interspecific Genome Comparisons: 
Gene Content 


Genome sequencing indicates that certain genes are found 
in all organisms, whether bacteria, archaea, or eukaryotes, 
and suggests that these genes must have arisen early in the 
evolution of life on Earth. Such highly conserved genes— 
for example, the genes encoding proteins needed for DNA 
synthesis—are involved in biological processes common 
to all species. Other genes have a more recent origin and 
define specific clades of species. For instance, genes en- 
coding tubulin are found in all eukaryotes, implying that 
the tubulin gene evolved before the diversification of the 
eukaryotes. Still other genes are shared among more re- 
stricted clades of organisms, and some genes are confined 
to only closely related species. In this way, the phyloge- 
netic distribution of gene families provides information on 
when specific genes evolved. Furthermore, the set of genes 
shared among any group of organisms can be considered to 
represent the minimum genomic content of the common 
ancestor of that group of organisms, thus providing infor- 
mation on the evolution of both genomes and organisms. 
Because the first genomes to be sequenced were 
from phylogenetically diverse organisms, many genes 
appeared to be specific to particular taxa. However, as 
more genome sequences were determined, genes initially 
thought to be unique were found to have counterparts 
in the genomes of related species. Indeed, two closely 
related species may share almost their entire genome 
content, with the genomic differences between sister taxa 
defining the differences between the two species. For 
example, genome content is very similar in four closely 
related Saccharomyces species (S. cerevisiae, S. paradoxus, 
S. mikatae, and S. bayanus), all separated by 5 to 20 million 
years (Figure 18.11). Throughout the genomes of the four 
Saccharomyces species, just a handful of species-specific 
genes were detected, with an average of one unique gene 
for every 0.5 million years of evolutionary distance. It is not 
yet clear whether this rate is typical for other organisms. 
But it does bring up the question: How do new genes form? 
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Figure 18.10 The tree of life, highlighting the phylogenetic relationships of model organisms 
discussed in this book. 


Gene duplication by duplication of genomic 
DNA. Duplication of genetic material can duplicate 
a portion of a gene, a single gene, a chromosome or 
chromosome segment, or the entire genome (see 
Chapter 13). 


The Births and Deaths of Genes In tracing the evolu- I; 
tionary history of genes by comparing genome sequences, 
geneticists obtain clues to the mechanisms through which 
new genes arise (Figure 18.12). These mechanisms include 


the following. 
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Figure 18.11 Comparison of four 
Saccharomyces genomes. Predicted 
open reading frames (ORFs) are 
depicted as arrows pointing in the direc- 
tion of transcription. Orthologous ORFs 
(see p. 628) are connected by dotted 
lines. ORFs with a one-to-one correspon- 


Saccharomyces 
species 


1 


S. cerevisiae 


S. paradoxus 


dence are shown in blue; ORFs with a S. mikatae 
one-to-two correspondence are in red 
(S. paradoxus has two genes in place of S. bayanus 


gene 7 of S. cerevisiae); ORFs that are 


15 25 34 


unmatched (gene 24 in S. cerevisiae) are 
in white. Sequence gaps are indicated 
by vertical black lines. 


2. Gene duplication by unequal crossover. In a special 
case of gene duplication, one or more genes can be 
duplicated by unequal crossover due to misalignment 
of homologous chromosomes at synapsis during 
prophase I of meiosis. Gene duplication by unequal 
crossover is indicated by the detection of tandem 


repeats, or back-to-back copies, of genetic material 
(see Chapter 13 and Chapter 3 Case Study). 


3. Exon shuffling. During an exon-shuffling event, 
exons from two or more genes are combined in 
a new genomic context (see Figure 18.7a). The 
rearranging could occur through illegitimate 
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Basic Local Alignment Search Tool 


PURPOSE Homologous genes are derived from a common 
ancestral gene and often have similar functions. A computer 
program called the Basic Local Alignment Search Tool (BLAST) 
was developed in 1990 by Stephen Altschul, David Lipman, 
and colleagues to search for homologous sequences. BLAST, 
perhaps the most widely used and most important tool em- 
ployed in bioinformatic endeavors, allows scientists to search 
databases for sequences similar to any input sequence. 

The BLAST program of the National Center for Biotechnology 
Information at the National Institutes of Health (http://blast. 
ncbi.nim.nih.gov/Blast.cgi) enables searches of either DNA 
sequence similarity or protein similarity. Various types of searches 
can be performed. Here are three of the most common. 


I nucleotide blast (blastn): a nucleotide query sequence 
is compared to nucleotide sequences in the database. 

I tbhlastn: a protein query sequence is compared with the 
nucleotide databases, hypothetically translated into all six 
potential reading frames. 

I tblastx: a nucleotide query sequence is translated into 
all six possible reading frames and compared against the 
nucleotide sequences in the database that have also been 
translated into all six possible reading frames. 


PROCEDURE One of the first experiments researchers per- 
form once they have determined the sequence of a gene is 
to “BLAST” their sequence against the GenBank database, 
where most DNA sequences determined anywhere in the 
world are deposited. To perform a search, the user enters 
an “input” nucleotide or protein sequence into a window, 
and the BLAST program then searches chosen databases for 


similar sequences. Sequences are given a score based on the 
extent of similarity and relative to the probability that the 
sequences could be similar by chance. 


CONCLUSION What information can be derived from this 
experiment? First, the results of the BLAST search can provide 
clues to the biological and biochemical function of the gene 
used as a query. Since homologous genes are descended 
from a common ancestor, they likely share biochemical activ- 
ity if not biological context. Second, knowledge of the phylo- 
genetic distribution of homologous genes allows inferences 
to be made about when the gene evolved. For example, if the 
query is a human gene and if homologous genes are detected 
in all eukaryotes, the protein is likely to perform a function 
conserved in all eukaryotes. Conversely, if only mammals have 
homologous genes, the gene is likely to perform a function 
specific to mammals. 

Since related species often have conserved amino acid 
sequences but, due to the redundancy of the genetic code, pos- 
sess different nucleotide sequences, a tblastn (or tblastx) search 
is often more sensitive than a blastn in identifying homologous 
sequences from distantly related species. When a researcher has 
no prior knowledge of the DNA sequence being used as a query, 
tblastx searches are particularly useful because they identify 
DNA sequences with the potential to encode similar proteins. 

What if a BLAST search fails to find any other sequences in 
the database similar to the query sequence? If the sequence is 
known to encode a protein, the result suggests that the gene 
for the protein is unlikely to be conserved in a broad phyloge- 
netic sense. Alternatively, if the sequence is noncoding DNA, 
a lack of similarity to other DNA sequences is not unexpected. 


For more practice doing a BLAST search, see Problems 14 and 15. 
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E UC scriptase, and their insertion into the genome, often 


leads to the formation of pseudogenes, sequences 
recognizable as mutated gene sequences, but can also 
produce new genes. More than 10,000 pseudogenes 
have been recognized in the human genome, and many 
were derived from reverse transcription. In addition, 
the insertion of a retrotransposon into a new genomic 
location can alter the expression pattern of adjacent 
genes, potentially leading to new gene functions. 


© Gene duplication by 
unequal crossover 


5. Derivation of exons from transposons. Transposons 
have sequences encoding a DNA-binding protein 
called transposase that is necessary for movement of 
the transposon. Transposase sequences can be made 


l 
| ae to perform a new function if fused with other exons 
E 


© Exon shuffling 


derived from the genome. For example, the RAGI and 
RAG2 genes of jawed vertebrates, whose protein prod- 
ucts are involved in rearrangement of DNA sequences 
| Transcription during the maturation of the immune system, were 
— derived from sequences encoding a transposase. 
ck transcription 6. Lateral (horizontal) gene transfer. The movement 
and insertion of genes from one species into the genome of another 
species is referred to as lateral gene transfer. Such 
events are common in prokaryotes, which may 


meee exchange genes with even distantly related organ- 
© Derivation of , , 
exons tran isms (see Chapter 6). Endosymbioses lead to large- 
transposable scale lateral gene transfer events, as in the case of the 
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© Reverse 


transcription 


elements (TE) mitochondrion and chloroplast. While less common 
between eukaryotes, lateral gene transfer has been 
New splice sites documented in some protists and plants. 

| evolve within TE 7. 


Gene fusion and gene fission. Two genes can fuse 
D into a single gene by deletion of the stop codon and 
Other TE sequences nee oa fonale th all 
| degenerate transcription-termination signals that normally sepa- 
rate genes. Alternatively, a single gene may be split into 
two genes, each with its own regulatory sequences. 


; 8. De novo derivation. Exons can be derived de novo 
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ane are incorporated into exons of adjacent genes. 
[| Organism B 
} Diverge Comparisons between the genomes of several related 
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gins of new genes in a multicellular eukaryote. The major 
, b : 
@ Gene fission/ = aa source of new genes, slightly less than 80% of the time, 
fusion a — was gene duplication, in which the duplicates were either 
Fusion | t Fission , è 

— tandemly arranged or dispersed at distant chromosomal 
e locations. A further 10% of new genes were derived from 
retrotransposition events, and, surprisingly, approximately 
(8) w novo derivation [| 12% arose de novo, from previously noncoding sequences. 
pal l Two mechanisms—gene duplication in eukaryotes 
_ ki and lateral gene transfer in prokaryotes—stand out as 


being the major mechanisms responsible for generation 
of genes. Let’s consider each of these mechanisms in 
greater detail. 


Figure 18.12 The birth of genes. 
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Figure 18.13 The fates of 
duplicate genes. 
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Gene Duplication The high rate of gene duplication 
is one surprising discovery arising from evolutionary 
genomics. Most genomes contain a mosaic of gene 
families derived from both ancient and more recent 
duplication events, indicating that genomes are dynamic 
and continuously changing over time. A study in 2000 by 
Michael Lynch and John Conery counted the duplicated 
genes in nine eukaryotic species and estimated the 
duplication rate: approximately 0.01 genes per million 
years. Thus, for an average eukaryotic genome with 
10,000 to 30,000 genes, this research suggests that one 
gene duplicates and is maintained in the genome every 
3000 to 10,000 years, a rate of gene formation higher than 
has been observed in the Saccharomyces species. 

The fate of duplicated genes depends on the mo- 
lecular basis of the duplication. If the entire gene includ- 
ing regulatory sequences is duplicated, both copies will 
be able to produce a functional protein product in the 
correct amount, time, and place. In this case, the dupli- 
cate genes are genetically redundant and are free to evolve 
new functions, as long as the composite functions of the 
two duplicate genes retain the function of the original 
gene. Fully redundant genes are not maintained over 
long time periods, usually because the duplicate genes 
undergo one of three likely fates (Figure 18.13). First, 
the vast majority of new genes degenerate into pseudo- 
genes due to a lack of positive selection, without which 
mutations will slowly accumulate and render the genes 
nonfunctional. Pseudogenes form a significant fraction of 
the genomes of some organisms. 

Second, mutations in each of the two copies can result 
in the two genes having complementary activities such that 
their combined activity is the same as the activity of the gene 
before duplication, a process called subfunctionalization. 
Third, in a process called neofunctionalization, a muta- 
tion in one of the duplicates could provide a function not 


performed by the original gene. In rare cases where the 
new function provides a selective advantage, the gene can 
be maintained and become fixed in the population. In the 
latter two cases, both copies remain functional, whereas in 
the first case, only a single copy retains activity. 

Repeated duplication events produce families of re- 
lated genes. Through gene duplications, gene losses, and 
speciation events, the relationships among these genes 
often become complex. Three terms describe different 
relationships of evolutionarily related genes. The broadest 
term is homology, which is defined as descent from a com- 
mon ancestor. Thus, homologous genes, or homologs, 
have descended from a common ancestral gene and are 
said to constitute a gene family (Figure 18.14). Two other 
terms define specific relationships between homologous 
genes. Paralogous genes, or paralogs, are genes whose 
origin lies in a gene duplication event. No indication of 
the age of the duplication event leading to the paralogs is 
implied. Generally, paralogs perform biologically distinct 
but biochemically related functions. Orthologous genes, 
or orthologs, are genes whose origin lies in a speciation 
event. They are genes in different species that are derived 
from a single ancestral gene in two species’ last common 
ancestor. Orthologs most often, but not always, have 
equivalent functions in the two organisms being com- 
pared. The globin genes in Figure 18.14 illustrate these 
evolutionary relationships. See Genetic Analysis 18.1 for 
practice in determining orthologous and paralogous rela- 
tionships of evolutionarily related genes. 

Gene duplication has been a key mechanism in 
generating new genes that over time have made possible 
the evolution of complex organisms. During globin gene 
evolution, gene duplication has permitted specialization, 
which in turn has allowed greater physiological com- 
plexity. Both subfunctionalization and neofunction- 
alization can be seen within the globin gene family. 
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The B-globin gene cluster in our genome and that of the 
chimpanzee genome have the same gene complement. 
The human B-globin gene and the chimpanzee B-globin 
gene are related by a speciation event, and the two 
genes are orthologs. 
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gene cluster 


Since genes within each cluster were derived from gene 
duplication events within a genome, members within a 
cluster are paralogous genes (i.e, the human 6-globin and 
B-globin genes are paralogs). The human 6-globin and 
the chimpanzee B-globin genes are also paralogs as they 
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The term homology may apply to the 
relationship between genes derived via 
a speciation event (orthologs) or to the 
relationship between genes derived via 
a gene duplication event (paralogs). 
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Since all globin genes are derived from a 
single ancestral globin gene, all myoglobin, 
a-, and B-globin genes are homologs. 


Figure 18.14 Orthology and paralogy, speciation events and gene duplications: Examples from 


the globin gene family. 


Neofunctionalization can be seen in the gene duplica- 
tion event that produced the hemoglobin and myoglobin 
genes, where hemoglobin functions to carry oxygen in 
the blood and myoglobin functions to bind oxygen in 
muscles. Subfunctionalization has also occurred in the 
globin genes, if an assumption is made that the ancestral 
B-globin was active throughout the life cycle of the organ- 
ism. If so, subfunctionalization is now evident between 
the e-globin and f-globin paralogs, where the e-globin 
is active in the embryo and the {-globin is active in the 
adult. Other examples of gene duplication are seen in the 
duplications of an ancestral gene leading to the family 
of genes that allow trichromatic vision in some primate 
species, including humans (see Chapter 3), and in the cre- 
ation of another gene family that specifies identity along 
the anterior—posterior axis of animals (see Chapter 20). 


Lateral Gene Transfer Lateral gene transfer, also known as 
horizontal gene transfer, is the transfer of genetic material 
between two species (see Section 6.7 for a description of 
lateral gene transfer). Lateral gene transfer may have been 
extensive early in the evolution of life, but as specialized 
genetic mechanisms evolved for control of gene expression, 


lateral gene transfer became less frequent within the 
eukaryotic lineage. 

A common lateral gene transfer event occurs through 
the sharing of plasmids among bacterial species (see 
Chapter 6), but other lateral gene transfer events between 
bacterial species and between bacterial and archaeal spe- 
cies also have been documented. Based on comparison of 
the sequenced bacterial and archaeal genomes, an esti- 
mated 1.5% to 14.5% of genes in any genome are the result 
of lateral gene transfer. This is likely to be an underesti- 
mate, since ancient transfer events may not be detectable. 
In an extreme example of lateral gene transfer, hyperther- 
mophilic bacterial species (bacteria able to live in extremely 
hot environments) have acquired genes from hyperther- 
mophilic archaeal species. Nearly a quarter of the genes in 
the bacterium Thermotoga maritima are most similar to 
archaeal genes, indicating an archaeal origin. One acquired 
archaeal gene encodes a reverse gyrase, a topoisomerase 
that induces positive supercoils in DNA and is required for 
adaptation to living at high temperatures. 

While genes encoding proteins with metabolic func- 
tions appear to have been donated in lateral gene trans- 
fer events, those that encode proteins for information 


GENETIC ANALYSIS 


PROBLEM Consider the phylogenetic tree of seven homolo- 
gous eukaryotic genes derived from three species. What is the 
relationship between the human genes and the Drosophila 
gene—are they paralogs or orthologs? What are the rela- 
tionships between the human and mouse genes—are they 


Indian hedgehog (mouse) 


: Indian hedgehog (human) 


paralogs or orthologs? 


BREAK IT DOWN: Recall that homologous 
genes are genes that have descended from a 
common ancestral gene (p. 628). 


Desert hedgehog (mouse) 


BREAK IT DOWN: Recall that orthologs are 
homologous genes produced by a speciation event and 
paralogs are homologous genes produced by a gene 
duplication event within a species. 


= Desert hedgehog (human) 


Sonic hedgehog (mouse) 


Sonic hedgehog (human) 


Hedgehog (Drosophila) 


Solution Strategies Solution Steps 


Evaluate 


ls 


Identify the topic this problem 
addresses and the nature of the 
required answer. 


1. This problem is about determining orthology and paraology of homologous 
genes. 


2. Identify the critical information given 2. The phylogenetic tree provides information about how the genes are related 
in the problem. to one another. 

Deduce 

3. Consider the toplogy of the phylogenetic 3. The node at the base of the tree represents the ancestral gene. Since all of 
tree. First examine the relationship the mammalian genes are more closely related to one another than they are 
between the Drosophila gene and the to the Drosophila gene, the ancestral organism had only a single gene. 
mammalian genes. 

their common ancestor? 

4. Examine the earliest node in the 4. At the earliest node in the tree (node 1), the divergence produced the 
phylogenetic tree to see if it corresponds Drosophila gene and a lineage of mammalian genes. Thus, this node is a 
to a speciation event or a gene duplication speciation event, with the common ancestor of Drosophila and mammals 
event. speciating to produce a lineage leading to Drosophila and another leading 

to mammals. 

5. Determine for each node in the tree 5. Following the lineage leading to the mammalian genes (node 2), the 
whether it represents a speciation or gene divergence produces two lineages each containing both mouse and human 
duplications event. genes. Thus, the duplication must have been a gene duplication and not 

a speciation. The divergence at node 3 is similar to that of node 2 and so 

must also be a gene duplication. In contrast, nodes 4, 5, and 6 all diverge to 

produce a mouse gene and a human gene and thus represent the speciation 
TIP: Orthologs are produced by a speciation event and | event leading to mice and humans. 

Solve paralogs are produced by a gene duplication event. 

6. What is the relationship between the 6. Since we concluded that the divergence at node 1 was a speciation event, 
Drosophila gene and the mammalian the Drosophila gene is orthologous to all of the mammalian genes and 
genes? vice versa. 

7. What are the relationships between the 7. Let's consider two sets of genes. First, consider mouse sonic hedgehog and 


human and mouse genes? 


For more practice, see Problems 16 and 23. 
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human sonic hedgehog—these two genes are related by a speciation even 
at node 6 and are thus orthologs. Next consider human sonic hedgehog and 
human desert hedgehog—these two genes are related by a gene duplication 
event at node 2 and are thus paralogs. Finally consider human desert hedge- 
hog and mouse indian hedgehog—these two genes are related by a gene 
duplication event at node 3 and are thus paralogs. 
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processing (e.g., replication, transcription, and transla- 
tion) are not commonly transferred. One possible ex- 
planation for this bias is that proteins with information 
processing functions often act in large complexes and are 
not easily incorporated into existing complexes in other 
species. 

Although lateral gene transfer is relatively common 
among bacteria and archaea, transfer between either bac- 
teria or archaea and eukaryotes or between eukaryotes is 
rare. This is due in part to the differences in transcrip- 
tional and translational control mechanisms in eukaryotes 
as compared to bacteria and archaea. Even though the 
bacterium Agrobacterium tumefaciens transfers genes to 
plant cells (see Chapter 17), there is little evidence that 
those genes have entered the germ line of the transformed 
plants. Conversely, there is no evidence of transfer of 
genes from transgenic plants to soil bacteria. However, 
there is one prominent exception to this generalization: 
the transfer of genetic material from endosymbionts to 
their hosts. The most conspicuous examples are the large- 
scale transfers of genes from mitochondria and chlo- 
roplasts to the nucleus in eukaryotic cells (explored in 
greater detail in Chapter 19). Finally, although lateral 
gene transfer between two eukaryotes is not thought to be 
common, it has been documented—for example, between 
parasitic flowering plants and their flowering plant hosts 
as well as between fungi and aphids. 


Interspecific Genome Comparisons: 
Genome Annotation 


By comparing the genome sequences of related species, 
researchers are often able to refine their annotations of 
predicted genes whose existence has not been experi- 
mentally confirmed. If the predicted gene in fact func- 
tions as a gene, orthologous genes are likely to exist in 
related species. 


Conserved Coding Sequences Comparative genomic 
analyses can facilitate the discovery of previously 
unannotated genes. Sequences that are conserved in the 
genomes of two or more species are more likely to be 
functional (e.g., encode genes) than sequences that are 
not conserved. Due to the redundancy of the genetic code, 
amino acid sequences of proteins are often more conserved 
than the nucleotide sequences that encode them. Thus, in 
searches for conserved coding sequences, the nucleotide 
sequences of each of the genomes are first translated into 
all six potential reading frames and the hypothetical amino 
acid sequences are compared (see tblastx in Research 
Technique 18.2). Conserved sequences can then be used 
to direct experimental examination of the predicted genes, 
leading to refinement of the genome annotation. 

Gene annotation can be hampered by a lack of ho- 
mology to known genes, and genes or exons of a small size 
(e.g., encoding proteins of less than 100 amino acids) are 
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particularly difficult to predict. Consider that stop codons 
occur, on average, about once in 21 codons (3/64) in a 
random sequence. Thus, random ORFs of 63 amino acids 
occur frequently (approximately 5% of the time in any 
random 189-bp sequence). Furthermore, in multicellular 
eukaryotes, the coding sequences of genes are typically 
broken into small exons (often encoding fewer than 100 
amino acids) dispersed over large distances, thus making 
their unambiguous identification a challenge. Annotation 
of such genes is typically feasible only with either experi- 
mental evidence or evidence of similar sequences in other 
genomes. 

In the case of the Saccharomyces species (see 
Figure 18.11), comparisons between the four genomes 
led to prediction of more than 40 previously unannotated 
genes encoding proteins between 50 and 100 amino acids 
in length. Likewise, comparisons of the human genome 
with the genomes of other vertebrates have aided in the 
identification of exons and significantly refined the anno- 
tation of the human genome. This is one respect in which 
the genome sequencing of model genetic organisms has 
greatly increased our knowledge of our own genome. 


Conserved Noncoding Sequences Besides helping 
to identify open reading frames, genome comparisons 
have also detected the presence of conserved noncoding 
sequences (CNSs). Noncoding DNA was once called 
“junk” DNA (a term originally coined by Sydney Brenner) 
since junk, as opposed to garbage, is something we tend to 
keep even though it serves no identifiable purpose. Today, 
however, we know that at least some of this noncoding 
DNA is functional; it contains regulatory sequences and 
genes that produce functional noncoding RNAs, such 
as microRNA genes and lincRNAs (see Chapter 15 for 
discussion of these types of genes). 

There are two methods for identifying conserved 
noncoding sequences, and they approach the task from 
opposite directions. In phylogenetic footprinting, 
conserved sequences are identified by searching for 
similar sequences in species separated by large evolution- 
ary distances (Figure 18.15). Conversely, in phylogenetic 
shadowing, conserved sequences are identified by first 
eliminating sequences that are not conserved in closely 
related species. Comparative sequence analyses are now 
often the first step to predicting regulatory sequences, 
which are then tested by experiment (see Figure 16.18). 

Regulatory sequences controlling expression of genes 
in most multicellular eukaryotes consist of enhancer mod- 
ules spanning hundreds and potentially tens of thousands 
of base pairs (see Chapter 15). A large number of CNSs 
that correspond to regulatory sequences have been identi- 
fied by phylogenetic footprinting using comparisons of 
mammalian or other vertebrate genomes (Figure 18.15a). 
Comparisons between mammals and fish have shown that 
enhancer modules can be conserved over large evolution- 
ary distances (the lineages leading to fish and humans 
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Figure 18.15 Phylogenetic footprinting. (a) Evolution of a conserved noncoding sequence (CNS). 
(b) A CNS associated with the SHH gene acts as an enhancer directing expression of the SHH gene in 


the developing limb bud. 


separated about 400 million years ago). Conserved non- 
coding sequences are often clustered in the genome, and 
they are often adjacent to evolutionarily conserved genes 
involved in basic developmental processes. For example, 
comparisons between the human, mouse, and fugu (puff- 
erfish) genomes identified a CNS corresponding to an 
enhancer module approximately 1 megabase distant from 
the Sonic hedgehog (SHH) gene (Figure 18.15b). When this 
CNS was tested for regulatory activity, it drove expression 
of a reporter gene in mice in a manner reminiscent of the 
endogenous SHH expression pattern in developing limb 
buds. This CNS is functionally important because muta- 
tions in this enhancer are associated with polydactyly in 
both mice and humans. 

Phylogenetic shadowing identifies conserved se- 
quences via comparison of multiple closely related species. 
In this approach, sequences that are not conserved in at 
least one of the species are removed from consideration, 
whereas sequences that are conserved in all species are 
considered as potential functional sequences. Phylogenetic 


shadowing of primate sequences has identified functional 
sequences in the human genome by looking for sequences 
that have not changed in any of several primate species 
(Figure 18.16). 


Interspecific Genome Comparisons: 
Gene Order 


Just as the evolutionary history of organisms and genes can 
be traced by comparisons of genomes, so can the evolu- 
tionary histories of chromosomes. For example, humans 
have 2n = 46 chromosomes, but our closest relatives 
(chimpanzees, gorillas, orangutans) have an additional pair 
of chromosomes, 2n = 48 (see Figure 13.28). Comparing 
the chromosomes of humans and these other primates for 
synteny—the conserved order of consecutive genes along 
the length of a chromosome or chromosomal segment— 
shows that a pair of chromosomes in our common ances- 
tor fused to form a single chromosome, chromosome 2, 
in humans. Other minor differences among primate 
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Figure 18.16 Phylogenetic shadowing of primate species. 


chromosomes can be accounted for by a small number of 
translocation and inversion events. 

Synteny can also be observed in more distantly related 
mammals, such as between mouse and human lineages 
that diverged about 100 million years ago (Figure 18.17). 
Genome sequence information can provide detailed views 
of synteny between even more distantly related organisms. 
Even if chromosome synteny is not conserved, synteny at 
the level of only a few genes, referred to as microsynteny, 
can sometimes be detected. For example, such informa- 
tion has revealed relationships between the chromosomes 
of birds and mammals. 

Even when synteny is conserved at a chromosomal level, 
comparative studies have revealed large numbers of small 
rearrangements between closely related species. In a sense, 
this can be considered a loss of microsynteny. The large 
amount of repetitive DNA in eukaryotic genomes coupled 
with unequal crossing over due to mispairing during meiosis 
provides a mechanism by which DNA rearrangements can 
occur. The presence of numerous small deletions, duplica- 
tions, and inversions suggests that chromosome structure is 
dynamic on a local scale. An example of a loss of microsyn- 
teny can be seen in the loss of strict colinearity between the 
mouse and human chromosomes shown in Figure 18.17. As 
we discuss later in this chapter, small rearrangements are 
also found within individuals of a single species. 

Another striking feature of most eukaryotic genomes 
examined to date is the evidence of past whole-genome 
duplications as well as smaller duplications involving only 
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Figure 18.17 Synteny between human and mouse 
chromosomes. 


634 CHAPTER 18 Genomics: Genetics from a Whole-Genome Perspective 


segments of chromosomes. Whole-genome duplications 
result in gene duplications on a massive scale and have 
contributed significantly to the evolution of many eu- 
karyotic lineages. A whole-genome duplication instantly 
provides duplicate sets of genes that can subsequently 
undergo sub- and neofunctionalization, the latter a driver 
of evolution. Immediately following a whole-genome 
duplication, a previously diploid species is transformed 
into a tetraploid. However, over time, through duplicate 
genes evolving into pseudogenes or becoming subfunc- 
tionalized, the initially tetraploid species evolves into one 
whose chromosomes behave as a diploid. This process has 
been termed diploidization (Figure 18.18a). 

Evidence for both past whole-genome and smaller 
segmental duplications can be seen in the Arabidopsis 
genome in Figure 18.18b. While whole-genome duplica- 
tions (e.g., polyploidy) are particularly abundant in plants 
(see Chapter 13), they are not limited to plants. Evidence of 
past genome duplications is seen in fungal (e.g., S. cerevisiae) 
as well as vertebrate genomes (e.g., Danio rerio). 
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Intraspecific Genome Comparisons 


It is convenient to speak of “sequencing the genome of a 
species” as though one genome represents all members 
of that species, but logic tells us that this is not the case. 
Allelic differences, defined by polymorphisms in DNA 
sequences, are the ultimate cause of phenotypic differ- 
ences between individuals of a species. And this genetic 
diversity, the raw material on which natural selection can 
act, is seen in intraspecific comparisons of the genomes of 
any two individuals that are not clones. 

The study of allelic distributions is the foundation 
of population genetics (the subject of Chapter 22). Just 
as the evolutionary history of life in general is written 
in the genomes of species, the evolutionary history of a 
species is reflected in the distribution of polymorphic al- 
leles among populations. While population genetics has 
been an established field for many decades, we are just 
beginning to examine genetic diversity from a genomic 
perspective. 
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Figure 18.18 Evidence of past whole-genome duplications. (a) Following a whole-genome 
duplication, gene loss via pseudogene formation results in a “diploid” species. (b) Evidence of past 
whole-genome duplications in the Arabidopsis genome. Colored bands connect duplicated segments. 
Twisted bands connect duplicated segments having reversed orientations. 


The sequences representing the genomes of model 
organisms were derived from either a haploid individual 
or an inbred (homozygous at most or all loci) laboratory 
strain of a diploid organism and thus lack polymorphisms. 
The DNA sequence of the individual or individuals used 
to construct the initial complete genome sequence is 
called the reference genome sequence. Polymorphisms 
in these species can be identified by comparing the ge- 
nome sequences of different strains collected from differ- 
ent populations derived from the wild with the reference 
genome sequence. The reference genome sequence can be 
used to expedite the assembly of WGS sequence data from 
each new subject. Through the use of next-generation se- 
quencing technologies, this “resequencing” of genomes is 
inexpensive and is becoming increasingly common. 


Human Genetic Diversity 


Two intriguing questions arise in the course of study- 
ing the genetic diversity of humans through genome 
analyses: (1) To what extent does genomic sequence vary 
from one person to another? and (2) What does it mean 
to be human in the genomic sense? The first question is 
addressed here while the second question is explored in 
more detail in Chapter 22. 

The first two human genome sequencing projects 
identified a limited set of polymorphisms of the human 
genome. The DNA sequenced in the publicly funded 
human genome project was isolated from sperm cells of a 
number of anonymous male donors and from white blood 
cells of anonymous female donors. Thus, multiple alleles 
for a given site were sometimes revealed in the data from 
this project. In contrast, since DNA sequenced by Celera 
was isolated from a single individual, company founder 
J. Craig Venter, a maximum of two alleles for any autoso- 
mal gene could be detected. 

By the end of 2013, entire genome sequences for thou- 
sands of individuals were available, representing much 
human diversity from every inhabited continent. These 
included the genome of !Gubi, a Khoisan indigenous 
hunter-gatherer from the Kalahari Desert; Archbishop 
Desmond Tutu, a South African of Bantu descent; and 
Inuk, a paleo-Eskimo from Greenland represented by 
4000-year-old permafrost-preserved hair. In addition, 
through the Human Genome Diversity Project, the 
sequencing of genomic DNA from a broad spectrum 
of humans around the world has identified millions of 
polymorphisms distinguishing individuals and popula- 
tions, thus providing an unprecedented view of human 
genetic diversity. 


SNPs and Indels in Humans 


Genetic variation ranges from the identity of a single nu- 
cleotide, or single nucleotide polymorphisms (SNPs), to 
larger-scale structural changes, such as insertions and de- 
letions, which are collectively called indels, and inversions. 
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These indels and inversions—collectively called structural 
variants—were previously unknown before large-scale se- 
quencing studies because they are too small to be detect- 
able by karyotype analysis. A specific type of structural 
variant, called a copy-number variant (CNV), is due to 
indels greater than 1 kb in length (Figure 18.19a). While 
many CNVs are small, some are hundreds of kilobases 
long, span several genes, and result in alterations of gene 
dosage. The larger deletions are often in chromosomal 
regions that are present in more than one copy due to 
previous duplications, suggesting that genes in the deleted 
segments would have been redundant. A likely origin of 
indels is the occurrence of unequal crossing over after 
mispairing during meiosis (Figure 18.19b). 

A sampling of SNP variation between two randomly 
chosen humans reveals differences at about 1 in 1000 
bases in DNA sequence, or approximately 3 million base 
pairs in the 3 X 10? bp human genome. Variation is 
greatest in African genomes, consistent with Africa being 
the place where our species originated. For example, SNP 
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Figure 18.19 Copy-number variants. (a) Relationship 
between size of DNA polymorphisms and their frequency. 
(b) CNVs can be formed during meiosis by unequal crossing 
over mediated by repetitive DNA. 
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differences between two Namibian Khoisan individuals 
are greater than differences between European and Asian 
individuals. Furthermore, SNPs found outside Africa are 
most often a subset of those found within African popula- 
tions, consistent with our recent migration out of Africa 
(see Case Study in Chapter 1). Studies analyzing genome 
sequences of parents and their offspring indicate that 
SNP variation accumulates due to mutation at the rate of 
about 30—50 new SNPs in each individual’s germ cells in 
each generation. This is a rate of about 1 change in every 
108 bp, a figure remarkably similar to that observed in 
similar experiments in the flowering plant Arabidopsis, 
suggesting this error rate may be near the limit of DNA 
replication fidelity. 

Between individuals the number of base-pair differ- 
ences due to CNVs is greater than 100 times more than 
that of SNPs. While the full extent of CNVs is not known, 
a survey of 2500 individuals revealed extensive variation. 
On average, individuals had more than 500 kb of CNVs, 
and while most CNVs were small, 65-80% had a CNV 
larger than 100 kb, 5-10% had a CNV larger than 500 kb, 
and 1-2% had a CNV larger than 1 Mb. Only the largest 
of these CNVs would have been detected by karyotype 
analysis. Most of the larger CNVs were rare, present only 
in less than 1% of the population. Whether or not these 
rare CNVs are associated with genetic disease is an active 
area of investigation. As with SNP variation, the genomes 
of the African donors possessed much greater diversity 
than those of the non-Africans. Studies analyzing genome 
sequences of parents and their offspring indicate that 
8-25 kb of CNV variation accumulates due to mutation in 
each individual’s germ cells in each generation. 

All of this genetic variation is the raw material on 
which evolutionary processes act, and we return to this 
topic in Section 22.8, which examines the evolutionary 
history of our species. 


Prenatal Genome Sequencing 


The discovery in 1997 of cell-free fetal DNA in maternal 
blood raised the possibility of noninvasive genetic diag- 
nosis of the fetus. With the advances in sequencing tech- 
nologies, this has now become a reality. During the first 
and second trimesters, approximately 10% of the cell-free 
DNA in maternal blood is derived from the fetus, usually 
from the trophoblast. Thus, sequencing of the cell-free 
DNA provides a source of information about the fetal 
genome. The sequence derived from cell-free DNA can 
be compared with that of the maternal genome, which 
can be acquired from other cells of the mother’s body, 
and some aspects of the fetal genome can be deduced. 
Trisomy is easily detected by an overrepresentation of 
a specific fetal chromosome. If the mother’s genome 
sequence is assembled sufficiently for comparison, alleles 
present in cell-free DNA that are not present in the moth- 
er’s genome must either be derived from the father or 


represent de novo mutations. With sufficient information 
concerning both the mother’s and the father’s genomes, 
the fetal genome can be assembled in its entirety, with 
both paternally and maternally inherited alleles identi- 
fied, as well as new mutations. Thus, prenatal genome 
sequencing can provide a non-invasive screen for trisomy, 
familial genetic diseases, and de novo mutations, as well 
as paternity. 

While prenatal genome sequencing is technically 
feasible, its application raises a number of ethical issues. 
One issue is that the alleles of both maternal and paternal 
genomes will also be revealed in the process. Another is that 
prenatal genome sequencing can be done as early as the first 
trimester, when prospective parents may be making deci- 
sions about pregnancy termination. This potentially mag- 
nifies the concerns that revolve around standard prenatal 
diagnosis, since prenatal genome sequencing has the ability 
to reveal more kinds of genetic variance in the fetus. 


18.4 Functional Genomics Aids 
in Elucidating Gene Function 


While the genome sequence supplies a catalog of genes for 
an organism, it does not directly provide an understanding 
of how the genes direct the organism’s development and 
physiology. For this, we need to know when and where genes 
are expressed, the phenotypes of loss- and gain-of-function 
alleles, which other genes act in the same or redundant path- 
ways, and which proteins each gene product interacts with. 
Functional genomics is the study of gene function from a 
whole-genome perspective. 

High-throughput technologies, in which a large num- 
ber of genes are analyzed simultaneously, have enabled 
genome-wide examination of RNA- and protein-expression 
patterns, genetic interactions, and protein-DNA as well as 
protein-protein interactions. In addition, high-throughput 
technologies have facilitated the creation of mutant alleles 
of all genes in the genome of some model genetic species. 
In this section, we describe some high-throughput tech- 
nologies of functional genomics and consider what we have 
learned by applying them to model organisms. 


Transcriptomics 


One important clue to the function of a gene is when and 
where the gene is expressed. The study of gene expression 
from a genomic perspective is called transcriptomics, 
and the set of transcripts present in a cell or organism 
is called the transcriptome. Northern blotting is used 
to analyze gene expression (see Chapter 10). However, 
northern analysis is not amenable to a high-throughput 
design. Two high-throughput techniques used to ana- 
lyze the transcriptome are high-throughput sequencing 
of cDNA and DNA microarrays. High-throughput se- 
quencing is becoming the dominant method, but DNA 
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microarrays are still in widespread use. Below we describe 
the high-throughput sequencing approach. We also il- 
lustrate the use of microarrays as they provide a striking 
visual representation of transcriptomics. 


Transcriptome Analysis by Sequencing High- 
throughput DNA sequencing techniques (see Chapter 7) 
provide a direct way of assaying the transcriptome. In 
this approach, RNA is isolated from the cells of interest 
and converted into cDNA, which is then fragmented and 
sequenced using high-throughput DNA sequencing. The 
resulting sequence is compared to the reference genome 
sequence to identify sequences that are present in the 
cDNA population. 

The sequencing approach has two advantages over 
those using hybridization-based techniques. First, the se- 
quencing approach has the potential to be more quantita- 
tive. Since millions of cDNA fragments can be sequenced, 
precise quantitative data on gene expression levels can be 
obtained. Second, sequencing approaches can more easily 
distinguish between transcripts with similar sequences, 
such as alternative splice variants and SNPs, which are 
sometimes difficult to distinguish with hybridization 
techniques that microarrays rely upon. 

The first application of high-throughput sequenc- 
ing to transcriptome analysis of the yeast genome was 
published in 2008. It provided precise descriptions of the 
5’ and 3’ ends of transcripts and clarified gene annotations. 
Subsequent similar studies on other species followed, 
revealing the extent and nature of alternative splicing, 
which is prevalent in most multicellulular eukaryotes. 
Such experiments have also facilitated gene annotation by 
identifying novel transcripts. Genes that had not yet been 
annotated using computational approaches have often 
been identified by using expression data. 

One surprising result from the application of next- 
generation sequencing of transcriptomes was the large num- 
ber of previously unidentified transcripts, many of them 
noncoding, present in the cells of many multicellular eu- 
karyotes. Some of these have been demonstrated to encode 
microRNAs or IncRNAs (see Chapter 15), but many others 
do not have any as-yet-known functions. The numbers of 
such transcripts range in the hundreds in some invertebrates 
to thousands in mammals, and an active area of research is 
to identify the functions, if any, for these RNA molecules. 


Expression and Tiling Arrays DNA microarrays consist of 
collections of synthesized DNA fragments (oligonucleotides) 
attached to a solid support (Figure 18.20). The DNA 
fragments are of a fixed length, usually 25 to 70 bases. The 
specific DNA sequences, representing sequences present in 
a genome, are chemically synthesized on a silicon substrate, 
called a chip, at high density—tens of thousands to millions 
of oligonucleotide sequences per array, each sequence 
located on a different spot. Following hybridization with a 
fluorescent probe representing cDNA, the intensity of the 
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Figure 18.20 Transcriptome analysis using oligonucleotide 
arrays. 


signal from each of the spots reflects the concentration of 
the sequence complementary to the probe. One advantage 
of microarrays is that they can be custom designed because 
the spots can be added independently. Many variations of 
microarrays have been produced, of which we describe two: 
expression arrays and tiling arrays. 

An expression array carries unique sequences from 
every annotated gene of the genome. Hybridization of 
an expression array with labeled cDNA probes produces 
quantitative information about the relative expression 
levels of the genes represented on the array. The power 
to examine gene expression patterns through the use of 
expression arrays is limited only by the degree to which 
mRNA can be extracted from specific cells or tissues and 
converted to cDNA before labeling. 

An example from the budding yeast S. cerevisiae 
illustrates how microarray data can provide insight into 
the function of genes not previously identified by forward 
genetic approaches. Diploid yeast cells of S. cerevisiae 
produce haploid cells through the developmental process 
of sporulation, which consists of meiosis and spore mor- 
phogenesis. From forward genetic studies, approximately 
150 genes were known to be involved in sporulation, 
and these could be classified into four groups defined by 
expression patterns and mutant phenotypes. 
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To examine genome-wide expression patterns during 
sporulation, diploid yeast cells were induced to sporulate, 
RNA samples were taken at seven time points spanning 
11 hours, and their expression levels were compared to 
identify genes whose expression was either induced or 
repressed (Figure 18.21). More than 1000 genes exhibited 
significant changes at some point during the sporulation 
process: In about half of these cases the genes became in- 
duced, and in the other half the genes became repressed. 
In other words, more than six times as many genes as had 
been identified previously were likely to play some role 
during sporulation. 

The researchers categorized the induced genes by 
their expression patterns, expanding the four previously 
described patterns to at least seven. Genes with expres- 
sion patterns similar to those of known genes could be 
hypothesized to have biological roles similar to those 
of the known genes. For example, some “Early I” genes 
(see Figure 18.21) are known to function in the synapsis 
of homologous chromosomes. By extrapolation, other 
Early I genes whose functions are unknown may also 
have roles in synapsis of chromosomes, suggesting areas 
for experimental study to support or refute the predicted 
roles. Similarly, comparisons of sequences upstream of 
coordinately regulated genes can provide information on 
gene regulation. For example, more than 40% of the Early 
I genes have a consensus upstream regulatory sequence 
(URS1) to which the transcription factor UME6 binds, 
suggesting that this set of genes is coordinately regulated 
by the same transcription factor. The temporal gene- 
expression patterns during sporulation provide clues to 
the functions of hundreds of previously uncharacterized 
genes, some with homologs in humans. 

Technologies to characterize transcriptomes are now 
being applied to the study of human cancers, allowing 
precise characterization of gene expression in morpho- 
logically similar but molecularly different cancers, and 
facilitating targeted treatments with drugs known to 
affect specific gene products. 

The second type of array, the whole-genome tiling 
array, contains all sequences of the genome or of a genomic 
interval, including introns, exons, untranslated regions 
(UTRs), and intergenic regions. One of many applications 
of a whole-genome tiling array is to precisely map tran- 
scription patterns on the genomic DNA sequence via 
hybridization of a probe derived from an mRNA population 
on the array (Figure 18.22). Labeled cDNA is used to probe 
genomic tiling arrays to identify sequences being tran- 
scribed into mRNA or other types of RNA. 

Another application of whole-genome tiling arrays is 
the identification of transcription factor binding sites. This 
is accomplished by applying the technique of chromatin 
immunoprecipitation (ChIP; see Chapter 15) at a whole 
genome level. As described in Chapter 15, DNA that is im- 
munoprecipitated with antibodies to the protein of inter- 
est can be used as a probe on a tiling array. The spots on 


the microarray that hybridize with the probe correspond 
to the sequences the transcription factor was bound to in 
the cell. This technique provides a genome-wide view of 
protein-DNA interactions and is known colloquially as 
“ChIP-on-chip.” Note that rather than using the PCR prod- 
uct as a probe on a microarray, it can be sequenced directly 
using next-generation sequencing; the resulting protocol, 
known as ChIP-seq, has become the method of choice. 

Tiling methodology also takes the form of custom 
tiling arrays that contain only a subset of the genome and 
are used for high-throughput experiments focusing only 
on specific genes or sets of genes. 


Other “-omes” and “-omics” 


By the same logic that produced the terms genomics 
and transcriptomics, proteomics is the study of all the 
proteins—collectively known as the proteome—expressed 
in a cell, tissue, or individual. Whereas the biochemistry of 
nucleic acids is predictable—any nucleic acid can base- 
pair with any other nucleic acid, given complementary 
sequences—the biochemistry of the proteome is compli- 
cated by the much greater range of protein structures and 
functions. The study of proteins thus requires techniques 
tailored to specific subsets of proteins. 

Multiple high-throughput technologies have been 
developed for proteomic analyses, including techniques 
to study protein expression, protein modification, and 
protein-protein interactions. Examples of the latter— 
techniques that reveal whether and how different pro- 
teins interact—provide information on the functioning of 
biological systems by identifying, for instance, sets of pro- 
teins that form a complex. Here we discuss one technique 
for identifying interacting proteins. 

The two-hybrid system is a high-throughput method 
for discovering whether two proteins interact. This system 
relies on the modular nature of the GAL4 transcription fac- 
tor from yeast that binds to the GAL4 upstream activation 
sequence (or UAScara), which is an enhancer element, to 
activate the transcription of genes involved in galactose 
metabolism (see Chapter 15). One domain of the GAL4 
protein, the DNA-binding domain, binds to the UASgar4 
sequence; a second domain, the activation domain, acti- 
vates transcription by interacting with RNA polymerase II 
as well as other chromatin factors (Figure 18.23a). The two 
domains can be physically separated. 

To test whether two proteins interact, one of the pro- 
teins to be tested is translationally fused (see Chapter 16) 
to the GAL4 DNA-binding domain (BD), and the other 
protein to be tested is translationally fused to the GAL4 
activation domain (AD). Both of these chimeric genes 
are then transformed into a single yeast strain. If the two 
proteins interact, the GAL4-BD and GAL4-AD will be 
brought together, and GAL4-activated genes will be tran- 
scribed. Conversely, if the two proteins do not interact, no 
transcription of the GAL4-activated reporter gene will be 
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Figure 18.21 Analysis of yeast transcription patterns using microarrays. 
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Figure 18.22 Using tiling arrays to identify transcription. Probes derived from mRNA isolated 
from flowers or seedlings were hybridized with a whole genome tiling arrays. 
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Figure 18.23 Identifying protein-protein interaction networks. (a) The two-hybrid system 
identifies interacting proteins. (b) Application of the two-hybrid system identifies networks of 


interacting proteins. 
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observed. To facilitate the screening process, an auxotro- 
phic yeast strain is often used in which UASgar4 drives 
expression of a gene that will complement the auxotro- 
phic defect. For example, a histidine auxotroph with a 
UASgaz4:HIS transgene will not grow on media lacking 
histidine unless GAL4-mediated transcription is active. 
However, certain interactions cannot be detected with the 
standard two-hybrid system, including those in which the 
interacting proteins are not efficiently transported into 
the nucleus and those in which proteins require a third 
partner for interaction. 

Two-hybrid approaches have been applied successfully 
to many model systems, providing information on their 
protein-interaction networks. In S. cerevisiae, all pairwise 
combinations of the more than 6000 proteins encoded in 
the genome have been tested, providing an overview of 
protein-interaction networks in the living yeast cell (see 
Figure 18.23b). The sum of all of the protein-protein inter- 
actions in an organism is known as the interactome. 


Genomic Approaches to Reverse Genetics 


One surprising result of genome sequencing was the large 
number of genes identified by sequence analysis but not 
previously identified by forward genetic screens. Even in an 
intensely studied organism such as S. cerevisiae, only about 
1000 of the more than 6000 genes in the genome had been 
identified by forward genetic screens. Of the remaining 
5000 or so genes, about half had some sequence similarity 
to genes with a known or probable function, while the other 
half did not exhibit homology to any other known genes in 
other model systems. Analyses of other multicellular eukary- 
otic genomes had similar outcomes. The high-throughput 
techniques discussed above can provide information on 
gene expression patterns and protein-protein interactions, 
but to fully understand gene function, we must be able to 
analyze loss- and gain-of-function alleles. Reverse genetic 
approaches (see Chapter 17) provide an experimental 
avenue for exploring such alleles and, through them, the 
function of previously unidentified genes. 

An essential tool for genomic analysis by reverse 
genetics is a collection of mutant alleles for every gene 
in the genome, referred to as a knockout library. In the 
case of S. cerevisiae, a knockout library, containing dele- 
tion loss-of-function alleles of every gene, is available. 
In the mutant strains, the entire target gene is replaced 
with a marker gene that confers resistance to the antibi- 
otic kanamycin (Figure 18.24). In addition, in each dele- 
tion strain, the kanamycin gene is flanked by two 20-bp 
sequences, termed barcodes; a different set of barcodes 
is used for each deletion strain. The barcodes enable the 
abundance of each mutant strain to be independently 
quantified when grown in a mixed population consisting 
of multiple strains. Specific mutant strains can be veri- 
fied and quantified by selective amplification of barcode 
sequences using PCR-based strategies. 


(a) Construction of barcoded yeast deletion mutants 


UPK] kar” | wey 
ORF 
DuPK] kan? [SJDNK] | 


The coding regions of each gene were replaced by a selectable marker 
gene (e.g, kanamycin resistance), and barcodes unique to each gene 
were added upstream (UP) and downstream (DN) of the marker gene. 


(b) Competitive growth of pools of deletion mutant strains 


The barcoded mutant strains can be grown in competition with wild 
type or each other. In this example, the “blue” strain does not grow as 
well as the other three strains. DNA is isolated before and after growth, 
and each gene can be analyzed by using fluorescently labeled barcode 
primers. 


Before After 
growth growth 
PCR amplification of barcodes 
and fluorescent labeling 
© = © 

© @ 
e@ @ 


| 
Hybridization of labeled barcodes to a DNA microarray 


The relative proportion of growth of each strain can be examined by 
hybridizing the products to a DNA microarray. 
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Figure 18.24 Barcoded knockout libraries for phenotypic 
analyses of mutants. 


Use of Yeast Mutants to Categorize Genes 


A challenge for the future is to determine more precisely 
the molecular and biological roles of all genes to illuminate 
why they are maintained in the genome. As an initial step 
in this direction, yeast deletion strains have been analyzed 
to categorize S. cerevisiae genes as either essential for life 
or nonessential. 

The deletion strains are first constructed in diploid 
yeast. The heterozygous diploid deletion strain is then 
induced to undergo meiosis, allowing the phenotypes of 
deletion alleles to be analyzed in the haploid progeny. 
When mutations in each of the 6300 genes of S. cerevisiae 
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~6300 deletion strains 


Reduced growth of 
heterozygous diploids 
identified 186 haploid 
insufficient genes. 


Haploid lethal 
mutants identified 
1102 essential genes. 


~5000 viable deletion mutants 


Reduced growth of 
homozygous mutants 
identified 891 genes 
needed in optimal 
conditions. 


~4000 “nonessential” genes 


Genes tested under 
1144 different 
growth conditions 


For ~3800 genes, 
homozygotes exhibit 
a growth defect in at 
least one condition. 


205 “genes” may 
be questionable. 


Figure 18.25 Global analysis of yeast deletion mutants. 


were examined in this way, deletion alleles of 1102 genes 
were not recoverable in haploid progeny (Figure 18.25). 
These genes, about 20% of the yeast genome, define the 
essential gene set of S. cerevisiae, meaning that they are 
required for survival of the organism. In addition, 186 of 
the deletion mutants had a reduced-growth phenotype as 
heterozygotes before induction of meiosis, thus indicating 
haploinsufficiency of these genes. (Recall that haploinsuf- 
ficiency is a dominant phenotype in diploid organisms 
that are heterozygous for a loss-of-function allele.) For the 
remaining 5000 genes, both haploid deletion mutants and 
homozygous diploid mutants were obtained. However, 
891 of these mutant strains exhibited a slow-growth 
defect in rich media under optimal conditions, which 
indicates that the genes are required for vital biological 
processes in optimal growth conditions. This leaves about 
4000 genes for which no obvious mutant phenotype is 
detected under optimal growth conditions. These genes 
are referred to as nonessential, but that classification is 
dependent on environment; in other words, the genes are 
nonessential under optimal laboratory growth conditions. 

One possible explanation for the lack of conspicuous 
mutant phenotypes associated with 4000 nonessential 
S. cerevisiae genes is that the genes are required only 
under specific growth conditions. To test this hypothesis, 
each mutant strain was grown under a variety of environ- 
mental conditions, including variations in temperature, 


media composition, and the presence of antifungal com- 
pounds, salts, and other chemicals known to perturb 
specific biological processes. As a result, yeast geneticists 
discovered measurable growth defects under at least one 
environmental condition for 3800 of the 4000 genes pre- 
viously identified as nonessential. Thus, these genes are 
required for efficient growth in at least one tested envi- 
ronmental condition; they are not really “nonessential” 
from an evolutionary perspective because their presence 
is likely to provide a selective advantage. Growth defects 
were not found for only about 200 deletions, suggesting 
that either (1) these genes are authentically nonessential, 
(2) the conditions to test their importance were not met, 
or (3) their annotation as genes is incorrect. 

To further analyze the essential genes, conditional 
alleles are required. Traditionally, temperature-sensitive al- 
leles isolated in forward genetic screens have been used to 
study functions of essential genes. Libraries of engineered 
conditional alleles of S. cerevisiae essential genes have also 
been constructed for this purpose. In one approach, each 
essential gene is placed under the control of a tetracycline- 
repressible promoter. In the absence of tetracycline, the 
gene is expressed, but upon addition of tetracycline, gene 
expression is repressed, creating a loss-of-function pheno- 
type. In another approach, a short peptide tag that confers 
heat-inducible protein degradation is added to the cod- 
ing regions of essential genes. Under the normal growth 
temperature of 30°C, the protein is stable, but at 37°C, the 
tagged proteins are degraded and lose the ability to function. 

Other types of libraries that have been constructed 
provide additional tools for identifying potential gene 
functions in S. cerevisiae. For example, a library in which 
every gene is a translational fusion with green fluorescent 
protein (GFP) permits visual determination of the subcel- 
lular location of proteins. 


Genetic Networks 


Identification of genetic interactions can provide clues to 
gene function by revealing that two genes act in the same 
pathway or redundant pathways (see Chapter 16). Data 
derived from double mutants identify sets of interacting 
genes that define genetic networks. 

An extreme example of a genetic interaction is syn- 
thetic lethality, where the mutation of either gene alone 
is not lethal but mutation of both genes together results 
in lethality (see Figure 16.5). A genome-wide estimate of 
the number of synthetic lethal interactions in S. cerevisiae 
was obtained by using mutants representing 132 genes 
and analyzing their genetic interactions. For genes whose 
single-mutant phenotype is inviability, conditional alleles 
were used; for nonessential genes, null alleles were used. 
Each of the 132 mutants was crossed with 4700 viable 
deletion mutants, and the double-mutant phenotypes were 
examined. Approximately 4000 different synthetic lethal 
interactions were identified, involving about 1000 different 


18.4 Functional Genomics Aids in Elucidating Gene Function 


genes. The number of interactions per gene ranged from 
1 to 146, with an average of 34. One striking feature of this 
genetic interaction study is that essential genes exhibited 
about five times as many interactions as did “nonessential” 
genes. These results suggest that genetic networks consist 
of a small number of essential genes participating in many 
interactions and a larger number of nonessential genes 
participating in fewer interactions (Figure 18.26). 

If the same level of synthetic lethality exists for the 
remaining genes in the yeast genome, it is estimated that 
200,000 different synthetic lethal interactions will occur 
among all yeast genes and that 1% of all double mutants 
will result in synthetic lethality. Thus, while only 1000 
genes are essential under optimal laboratory growth condi- 
tions as defined by single-mutant phenotypes, additional 
genes become essential when organisms are compromised 
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by a mutation in another gene. One explanation for the 
observed levels of synthetic lethality is that where there are 
multiple genetic pathways, some of the pathways buffer 
one another, creating stable genetic systems that are better 
able to withstand environmental and genetic perturbations. 

Genetic networks defined by genetic interactions of- 
ten identify groups of genes having similar molecular 
functions, such as translation, lipid metabolism, or DNA 
repair (see Figure 18.26). If a gene of unknown function 
belongs to a genetic network in which many genes have 
known roles—say, in lipid metabolism—experiments to 
identify the molecular function of the unknown gene 
might begin by investigating whether the gene in question 
also plays a role in lipid metabolism. 

Genetic networks constructed on the basis of ge- 
netic interactions can be examined in comparison with 
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Figure 18.26 Genetic interactions identified through synthetic lethal analysis. Mutant alleles of 
eight genes (BNI1, RAD27, SGS1, BBC1, NBP2, BIM1, ARP2, and ARP40) were assayed for synthetic lethal 
interactions with the 5000 viable deletion mutants of yeast. 
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groupings based on other gene attributes, such as their 
annotations, expression patterns, or interactomes (dis- 
cerned from protein-protein interaction data). Prediction 
of biological functions of genes based on correlations be- 
tween different data sets is referred to as systems biology. 

Genetic interaction data often correlate well with gene 
expression data since genes that compensate for one an- 
other in function often exhibit similar expression patterns. 
In contrast, genetic interactions and protein-protein in- 
teractions overlap less often. One reason is that physically 
interacting proteins are likely to act in the same protein 
complex, whereas in genetic interactions, the proteins the 
genes encode often act in compensating pathways that 
would normally be composed of different protein com- 
plexes with related functions. This generalization holds 


CASE STUDY 


Genomic Analysis of Insect Guts May Fuel the World 


In metagenomic analysis, biologists study genomes collected 
from the multiple organisms that together inhabit a single 
environment. Two recent studies suggest that metagenomic 
analysis of insect digestive tracts could potentially have a sig- 
nificant impact on the production of biofuels. 

Much of the current supply of ethanol for fuel is produced 
from cellulose that comes from the lignocellulose component 
of corn. Lignocellulose is a mixture of cellulose (a complex 
carbohydrate composed of glucose molecules) and lignin (the 
rigid structural material that protects cellulose). The produc- 
tion of corn ethanol requires high temperature, high heat, and 
the use of toxic chemicals to break down the lignin and hydro- 
lyze the cellulose. This step is followed by microbial fermenta- 
tion of the sugar and distillation of ethanol. Obtaining ethanol 
from corn in this way has adverse effects on the environment, 
consumes a great deal of energy, and may not be economically 
viable. These are principal reasons why the investigation of 
lignocellulose digestion in insects is attractive. Identification 
and characterization of the genes responsible for lignocellu- 
lose digestion may allow the development of new, biologically 
based methods of biofuel production. 

In 2007, the microbiologist Falk Warnecke and his col- 
leagues conducted a metagenomic study of the microbes in the 
gut of the wood-eating termite species Nasutitermes. Termites 
are wood-digesting creatures whose ancestors have inhabited 
cellulose-rich environments for more than 100 million years. 
Nasutitermes has a bacteria-laden gut that acts like a tiny bio- 
reactor for digesting the lignocellulose in wood. Lignocellulose 
provides energy for these microorganisms, which first break 
down lignin to liberate cellulose and then break down cellulose 
via hydrolysis driven by hydrolase enzymes. 

Nasutitermes has a three-part stomach, the main part of 
which, designated P3, contains a rich microbial mixture of 
hundreds of bacterial species that are primarily responsible 
for wood digestion. Warnecke and his colleagues collected 
Nasutitermes in Costa Rica. Then, in the laboratory, they iso- 
lated and emptied P3 and found that its total volume in each 


true primarily when null alleles are used to test genetics 
interactions; however, when hypomorphic alleles are used, 
genetic interactions are often seen, revealing genes encod- 
ing proteins that act in the same complex or pathway (see 
Chapter 16). 

The ultimate objective of functional genomics stud- 
ies is to define the molecular function of every gene in 
an organism by compiling genomic data and searching 
for correlations that suggest hypotheses for further ex- 
perimentation. The discussion here focused on studies 
in S. cerevisiae, but similar approaches are being taken 
in other organisms. For example, enhancer—suppressor 
genetic screens described in Section 16.1 are a directed 
approach for uncovering genetic interaction networks 
and can be applied to most organisms. 


insect is just 1 microliter (uL). They isolated and performed 
shotgun sequencing on the DNA from the P3 microbial mass. 

Warnecke estimates that the DNA in this metagenomic 
analysis may come from as many as 300 bacterial species 
whose symbiotic relationship with the termite allows the 
termite to derive energy from wood. Gene-identification anal- 
yses indicate that many of the most frequently found genes 
in these bacteria produce glycoside hydrolases (GH) that 
hydrolyze lignocellulose. More than 700 different GH genes 
representing more than 45 different gene families were found 
in this study. A large group of previously unidentified genes 
was also found, and Warnecke speculates that these genes 
might be involved in various kinds of lignocellulose binding 
and digestion reactions. 

While Warnecke’s study detected numerous bacterial 
genes that may carry out cellulose digestion, it did not iden- 
tify any genes responsible for lignin digestion. However, 
a second study, published in 2008 by Scott Geib and col- 
leagues, examined lignin digestion in the Asian longhorn 
beetle (Anoplophora glabripennis) and the Pacific dampwood 
termite (Zootermopsis angusticollis). Biochemical analysis of 
the digestive tracts and digestive products of both insects 
shows significant evidence of lignin digestion, suggesting 
either that the genomes of these organisms encode lignin- 
digesting enzymes or that the organisms carry symbiotic 
microbes whose genomes encode the enzymes. The research- 
ers did not perform metagenomic analyses of the insect 
genomes or digestive system contents, but they did identify a 
single fungal species in the gut of the Asian longhorn beetle 
whose genome is likely to encode lignin-digesting enzymes. 

A great deal of additional “bioprospecting” research will 
be necessary to characterize the genes that encode the en- 
zymes driving lignin and cellulose digestion in insect guts. In 
the process, further genomic and metagenomic analyses may 
suggest ways these genes can be cloned and used to replace 
the costly current methods of lignocellulose-based ethanol 
production. 
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18.1 Structural Genomics Provides a Catalog mechanism of acquisition of new genes in bacteria and 


of Genes in a Genome 


| 


Genomes can be sequenced in either a clone-by-clone 
approach or a whole-genome shotgun approach. 

Paired-end sequencing facilitates assembly of scaffolds 
consisting of sequence fragments generated by shotgun 
sequencing. 

Metagenomics studies the genetic sequences of communities 
of organisms whose member species may be difficult to 
cultivate individually. 


18.2 Annotation Ascribes Biological Function 
to DNA Sequences 


| 


Genome annotation indicates the locations of genes and 
other functional sequences in a genomic sequence. It aims 
to ascribe biological function to sequence data. 

Functions of some annotated genes may be predicted based 
on sequence similarities with known genes as analyzed 
through computational approaches and bioinformatics, but 
experimental verification is required. 


18.3 Evolutionary Genomics Traces the History 
of Genomes 


A phylogenetic tree of life can be constructed by comparing 
sequences of orthologous genes. 

New genes can be produced by gene duplication due to 
unequal crossing over or by larger-scale duplications of 
DNA, retrotransposition, and other mechanisms. 

Most new genes degenerate rapidly, but some are retained and 
may acquire new functions, driving the evolution of new species. 
Gene duplication has been a key feature in the evolution 

of complex organisms. Lateral gene transfer is a common 
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archaea, but it is less common in eukaryotes. 

I By comparing genomes of related species, researchers 
can identify conserved genes and noncoding sequences 
and refine gene annotation. Conserved noncoding 
sequences often consist of gene regulatory 
sequences. 

E Intraspecific genome comparisons identify genetic 
variation within a species and provide information about 
its evolutionary history and population dynamics. Both 
intra- and interspecific comparisons reveal that genomes 
are dynamic and can change rapidly on evolutionary 
timescales. 


18.4 Functional Genomics Aids in Elucidating 
Gene Function 


I DNA microarrays and high-throughput sequencing can 
reveal polymorphisms, global transcription patterns, and 
transcription-factor binding sites. 

| Protein—protein interactions can be determined by using 
genetic tools developed from the study of yeast. 

I Knockout libraries are used to perform genome-wide genetic 
screens that elucidate gene function. They have allowed 
classification of all yeast genes as essential or nonessential 
under specific growth conditions. 

E Genes classified as essential under optimal growth 
conditions have on average more genetic interactions than 
those classified as nonessential. 

Genome-wide analyses of synthetic lethal interactions in 
yeast reveal large numbers of genes that are essential in 
genetically compromised organisms. 

E Systems biology is a research approach that correlates data 
sets derived from functional genomics in order to define and 
elucidate gene function. 
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PROBLEMS ( MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 
Chapter Concepts For answers to selected even-numbered problems, see Appendix: Answers. 
1. You have discovered a new species of Archaea from a hot annotation of the sequences—for example, in number of 


spring in Yellowstone National Park. 
a. After growing a pure culture of this prokaryote, what 


genes, gene structure, regulatory sequences, repetitive DNA? 


iat we ieee i ; 8. You have just obtained 100 kb of genomic sequence from 
STACEY TIEA YOU EMP OY rO SEJU cares ere GENOME; an as yet unsequenced mammalian genome. What are 
b. How would Vo ori if you were unable to three methods you might use to identify potential genes 
grow the strain in culture? in the 100 kb? What are the advantages and limitations of 
2. Repetitive DNA poses problems for genome sequencing. each method? 
a. Why is this so? . bag 9. The human genome contains a large number of pseu- 
b. What types of repetitive DNA are most problematic? dogenes. How would you distinguish whether a particu- 
c. What PENERE can be employed to overcome these lar sequence encodes a gene or a pseudogene? How do 
problems? pseudogenes arise? 
3. When the whole-genome shotgun sequence of the 
Dresophils Sere Was ee d ae 4134 10. Based on the tree of life in Figure 18.10, would you expect 
scaffolds madet oF leak Contes , human proteins to be more similar to fungal proteins or 
a h P BS. aih ffolds? to plant proteins? Would you expect plant proteins to be 
as y WETE AARTE SO RANY IMIOFE CONTES Man scallogs: more similar to fungal proteins or human proteins? 
b. What is the difference between physical and sequence ane p 
gaps? 11. When comparing genes from two sequenced genomes, 
c. How can physical gaps be closed? how does one determine whether two genes are ortholo- 
d. How can sequence gaps be closed? gous? What pitfalls arise when one or both of the genomes 
? 
4. How do cDNA sequences facilitate gene annotation? ate not sequenced: 
Describe how the use of full-length cDNAs facilitates 12. What are the differences between expression arrays and 
discovery of alternative splicing. genome tiling arrays? What types of data can be obtained 
: : 3 Fan : 
5. How do comparisons between genomes of related species ak: MHCEOATTAyS" Can high throughput sequencing 
help refine gene annotation? supplant most applications of microarrays? 
6. You are designing algorithms for the bioinformatic 13. mE two:hybrid meted tacilitares tie discovery of pro- 
n ; ; ; tein-protein interactions. How does this technique work? 
prediction of gene sequences. How might algorithms differ c think of for obtaini fal iti 
for predicting genes in bacterial versus eukaryotic genomic A VOA O ee ap ee 
sequence? result, that is, where the proteins encoded by two clones 
interact in the two-hybrid system but do not interact in the 
7. You have sequenced a 100-kb region of the Bacillus organism in which they naturally occur? Can you think of 


anthracis genome (the bacterium that causes anthrax) and 
a 100-kb region from the Gorilla gorilla genome. What 
differences and similarities might you expect to see in the 


Application and Integration 


14, Go to http://blast.ncbi.nlm.nih.go 


gi and follow 
the links to nucleotide blast. Type in the sequence below; it 
is broken up into codons to make it easier to copy. 


5’ ATG TTC GTC AAT CAG CAC CTT TGT GGT 

TCT CAC CTC GTT GAA GCTTTG TAC CTT GTT 

TGC GGT GAA CGT GGT TTC TTC TAC ACT CCT 
AAG ACT TAA 3’ 


As you will note on the BLAST page, there are several op- 
tions for tailoring your query to obtain the most relevant 
information. Some are related to which sequences to 
search in the database. For example, the search can be lim- 
ited taxonomically (e.g., restricted to mammals) or by the 
type of sequences in the database (e.g., cDNA or genomic). 
For our search, we will use the broadest database, the “nu- 
cleotide collection (nr/nt).” This is the nonredundant (nr) 
database of all nucleotide data (nt) in GenBank and can be 
selected in the “database” dialogue box. Other parameters 


reasons you might obtain a false-negative result, in which 
the two proteins interact in vivo but fail to interact in the 
two-hybrid system? 


For answers to selected even-numbered problems, see Appendix: Answers. 


15. 


can also be adjusted to make the search more or less sensi- 
tive to mismatches or gaps. For our purposes, we will use 
the default setting, which is automatically presented. Press 
“search.” What can you say about the DNA sequence? 


In the course of the Drosophila melanogaster genome proj- 
ect, the following genomic DNA sequences were obtained. 
Try to assemble the sequences into a single contig. 


' TTCCAGAACCGGCGAATGAAGCTGAAGAAG 3’ 
" GAGCGGCAGATCAAGATCTGGTTCCAGAAC 3’ 
’ TGATCTGCCGCTCCGTCAGGCATAGCGCGT 3’ 
’ GGAGAATCGAGATGGCGCACGCGCTATGCC 3’ 
’ GGAGAATCGAGATGGCGCACGCGCTATGCC 3’ 
" CCATCTCGATTCTCCGTCTGCGGGTCAGAT 3’ 


Annunnnn 


Using the assembled sequence, perform a blastn search 
using the “nucleotide collection (nr/nt)” database. Does 
the search produce sequences similar to your assembled 


16. 


17. 


18. 


19. 


20. 


21. 


Problems 647 


sequence, and if so, what are they? Can you tell if your a. For gene X, no gene duplications have occurred in any 
sequence is transcribed, and if it represents protein-coding lineage, and each gene X is derived from the ancestral 
sequence? Perform a tblastx search, first choosing the gene X via speciation events. Are genes AX, BX, and CX 
“nucleotide collection (nr/nt)” database and then limiting orthologous, paralogous, or homologous? 

the search to human sequences by typing Homo sapiens in b. For gene Y, a gene duplication occurred in the lineage 
the organism box. Are homologous sequences found in the leading to A after it diverged from that leading to B and 
human genome? Annotate the assembled sequence. C. Are genes AY1 and AY2 orthologous or paralogous? 


Are genes AY/ and BY orthologous or paralogous? Are 
genes BY and CY orthologous or paralogous? 

c. For gene Z, gene duplications have occurred in all spe- 
cies. Define orthology and paralogy relationships for 


Consider the phylogenetic tree below with three related 
species (A, B, C) that share a common ancestor (last 
common ancestor, or LCA). The lineage leading to species 
A diverges before the divergence of species B and C. 


the different Z genes. 
Last common 
ancestor 
(LCA) Gene X Gene Y Gene Z 
J 
A B C N N 
N 


Species tree 


AX BX CX AYI AY2 BY CY AZ1 AZ2 / | \ CZ1 CZ2 


Species A B C A B C BZ1 BZ3 BZ2 
You have isolated a gene that is important for the produc- bind specific DNA sequences. In this method, the DNA 
tion of milk and wish to examine its regulation. You exam- sequence to be tested, the bait, is fused to a TATA box 
ine the genomes of human, mouse, dog, chicken, pufferfish, to drive expression of a reporter gene. The reporter gene 
and yeast and note that all genomes except yeast have an is often chosen to complement a mutant phenotype; for 
orthologous gene. example, a HIS gene may be used in a his” mutant yeast 
a. How would you identify the regulatory elements impor- strain. A cDNA library is constructed with the cDNA 
tant for the expression of your isolated gene in mam- sequences translationally fused to the GAL4 activation 
mary glands? domain and transformed into this yeast strain. Diagram 
b. What does the existence of orthologous genes in how trans-acting proteins that bind to cis-acting regulatory 
chicken and pufferfish tell you about the function of sequences can be identified using a one-hybrid screen. 
this gene? 22. A substantial fraction of almost every genome sequenced 
When the human genome is examined, the chromosomes consists of genes that have no known function and that 
appear to have undergone only minimal rearrangement in do not have sequence similarity to any genes with known 
the 100 million years since the last common ancestor of function. 
eutherian mammals. However, when individual humans a. Describe two approaches to ascertaining the biological 
are examined or when the human genome is compared role of these genes in S. cerevisiae. 
to that of chimpanzees, a large number of small indels b. How would your approach change if the genes of 
and SNPs can be detected. How are these observations unknown function were in the human genome? 
reconciled? 


23. In the globin gene family shown in Figure 18.14, which 
pair of genes would exhibit a higher level of sequence 
similarity, the human 6-globin and human -globin genes 
or the human -globin and chimpanzee B-globin genes? 
Can you explain your answer in terms of timing of gene 


Symbiodinium minutum is a dinoflagellate with a genome 
size that encodes more than 40,000 protein-coding genes. 
In contrast, the genome of Plasmodium falciparum has 
only a little more than 5000 protein-coding genes. Both 


Symbiodinium and Plasmodium are members of the duplications? 
Alveolate lineage of eukaryotes. What might be the cause 
of such a wide variation in their genome sizes? 24. You are studying similarities and differences in how organ- 


isms respond to high salt concentrations and high tem- 
peratures. You begin your investigation by using microar- 
rays to compare gene expression patterns of S. cerevisiae in 
normal growth conditions, in high-salt concentrations, and 
at high temperatures. The results are shown on page 648 
with the values of red and green representing the extent of 
A modification of the two-hybrid system, called the one- increase and decrease, respectively, of expression for genes 
hybrid system, is used for identifying proteins that can a-s in the experimental conditions versus the control 


Substantial fractions of the genomes of many plants consist 
of segmental duplications; for example, approximately 40% 
of genes in the Arabidopsis genome are duplicated. How 
might you approach the functional characterization of such 
genes using reverse genetics? 
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(normal growth) conditions. What is the first step you will 
take to analyze your data? 


abcdefghijkImnopgqrs 


High temp/control 


25. 


26. 


27: 


28. 


In conducting the study described in Problem 24, you have 
noted that a set of S. cerevisiae genes are repressed when 
yeast are grown under high-salt conditions. 


a. How might you determine whether this set of genes is 
regulated by a common transcription factor? 

b. How might you approach this question if genome 
sequences for the related Saccharomyces species, 
S. paradoxus, S. mikatae, and S. bayanus, were also 
available? 


Using the two-hybrid system to detect interactions 
between proteins, you obtain the following results: A clone 
encoding gene A gave positive results with clones B and C; 
clone B gave positive results with clones A, D, and E but 
not C; and clone E gave positive results only with clone B. 
Another clone, F, gave positive results with clone G but not 
with any of A—E. Can you explain these results? 


To follow up your two-hybrid results of Problem 26, you 
isolate null loss-of-function mutations in each of the genes 
A-G. Mutants of genes A, B, C, D, and E grow at only 80% 
of the rate of the wild type, while mutants of genes F and G 
are phenotypically indistinguishable from the wild type. 
You construct several double-mutant strains: The ab, ac, 
ad, and ae double mutants all grow at about 80% of the 
rate of the wild type, but afand ag double mutants exhibit 
lethality. Explain these results. 


PEG10 (paternally expressed gene 10) is a paternally 
expressed gene that has an essential role in the formation 
of the placenta of the mouse. In the mouse genome, the 
PEG1O0 gene is flanked by the SGCE and PPPIR9A genes. 
To study the origin of PEG10, you examine syntenous 
regions spanning the SGCE and PPP1R9A loci in the 
genomes of several vertebrates, and you note that the 
PEG10 gene is present in the genomes of placental and 
marsupial mammals but not in the platypus, chicken, or 
fugu genomes. 
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The green bars indicate the exons of each gene. The 
gray bars represent LINEs and SINEs, and the blue 

bars represent long terminal repeat (LTR) elements of 
retrotransposons. Solid black diagonal lines link introns, 
and dashed black lines connect orthologous exons. 
Arrowheads indicate direction of transcription. 

Using the predicted protein sequence of PEG10, you 
perform a tblastn search for homologous genes and find 
that the most similar sequences are in a class of retrotrans- 
posons (the sushi-ichi retrotransposons). Propose an evo- 
lutionary scenario for the origin of the PEG10 gene, and 
relate its origin to its biological function. 


If you were to compare your genome sequence with that 
of your parents, how would it differ? If you were to com- 
pare your genome sequence with another student’s in 

the class, how would it differ? What additional difference 
might you see if your genome was compared with that of a 
sub-Saharan African, or if you are of sub-Saharan African 
descent, with that of a non-African? 


Organelle Inheritance and 
the Evolution of Organelle 
Genomes 


CHAPTER OUTLINE 


19.1 Organelle Inheritance Transmits 
Genes Carried on Organelle 
Chromosomes 

19.2 Modes of Organelle Inheritance 
Depend on the Organism 

19.3 Mitochondria Are the Energy 
Factories of Eukaryotic Cells 

19.4 Chloroplasts Are the Sites of 
Photosynthesis 

19.5 The Endosymbiosis Theory 
Explains Mitochondrial and 
Chloroplast Evolution 


Cross section of Chlamydomonas showing three types of cellular compart- 
ments having their own genetic material: nucleus (blue), mitochondrion 
(red), and chloroplast (green). 


ae after the rediscovery of Mendel's laws in the early 
1900s, Carl Correns and Erwin Baur, working indepen- 
dently, each noted a pattern of inheritance that was distinctly 
non-Mendelian. Both Correns and Baur were studying the 
inheritance in plants of a variegated phenotype in which indi- 
vidual branches had either white, green, or variegated leaves. 
Reciprocal crosses between flowers growing on white or green 
branches produced progeny that invariably exhibited the phe- 
notype of the female parent in the cross. 

The green coloration in land plants and green algae 
is due to the presence of the green pigment chlorophyll, 
which harvests light for photosynthesis. In plants, chlorophyll 
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is found in chloroplasts, which are the organelles 
where photosynthetic reactions convert light energy 
and CO; into fixed organic carbon. The variegated 
and white phenotypes studied by Correns and Baur 
are caused by a failure of chloroplast development 
in some cells, which as a consequence remain 
colorless (white). 

In the 1950s, studies demonstrated that chloro- 
plasts contain their own genome. In combination 
with the observation that chloroplasts are strictly 
maternally inherited in many plants, this discov- 
ery suggested an explanation for the maternal 
inheritance seen by Correns and Baur: The mutations 
they were studying must reside on the chloroplast 
genome. As we will see, the cell’s energy-producing 
and energy-capturing organelles—mitochondria 
and chloroplasts, respectively—each possess their 
own genome and may be either uniparentally or 
biparentally inherited depending on the species. 
Furthermore, uniparental inheritance may be 
maternal, paternal, or genetically determined. In this 
chapter, we explore the genetic transmission of the 
organelle genomes, the remarkable evolutionary 
events that led to the development of organelles, 
and the surprisingly dynamic interactions between 
the organelle and nuclear genomes of eukaryotes. 


19.1 Organelle Inheritance 
Transmits Genes Carried on Organelle 
Chromosomes 


Organelle inheritance refers to the transmission of genes 
on mitochondrial and chloroplast chromosomes—genes 
that are located in the cytoplasmic organelles as opposed 
to the nucleus. As with nuclear genes, expression of mi- 
tochondrial and chloroplast genes produces proteins and 
RNAs that perform specific functions in cells. However, 
genetic analysis of organelle inheritance differs from that 
of nuclear gene inheritance because, within a fertilized 
egg, the cytoplasm, in which the organelles are found, 
is not usually derived from equal contributions of both 
parental gametes. 

In many eukaryotic species, the mitochondria and 
chloroplasts in fertilized eggs are uniparental in their 
origin. This means that just one parental gamete—often 
the maternal gamete—contributes all of the cytoplasm 


and cytoplasmic organelles. In some species, organelles 
are inherited in a uniparental manner even though equal 
amounts of cytoplasm are inherited from both parental 
gametes. In such cases, the organelles derived from one 
of the gametes are selectively destroyed. In still other 
species, both parental gametes make contributions of 
cytoplasm and cytoplasmic organelles to the zygote; this 
pattern is termed biparental. Biparental cytoplasmic con- 
tributions are often unequal because one gamete contrib- 
utes more of the cytoplasm and the other gamete makes 
a smaller contribution. Additional reasons that the study 
of organelle inheritance differs from the study of nuclear 
inheritance may be summarized as follows: 


1. Multiple organelles may be present in eukaryotic cells. 


2. Each mitochondrion or chloroplast may contain 
multiple copies of its chromosome. The potential 
presence of tens to hundreds of copies of organelle 
chromosomes in each cell stands in contrast to the two 
copies of nuclear genes present in the cells of diploid 
organisms, in terms of both number and variability. 


3. The genome sizes (six to hundreds of kilobases), 
numbers (few to hundreds), and identities of the 
genes contained in the organelle genomes are variable 
from one species to another. 


4, Traits controlled by organelle inheritance can also be 
influenced by nuclear genes. Most biological func- 
tions ascribed to mitochondrial or chloroplast genes 
are produced through the joint action of nuclear 
genes and organelle genes. 


The Discovery of Organelle Inheritance 


Erwin Baur and Carl Correns were working independently 
of one another in 1908—Baur on Pelargonium (gerani- 
ums) and Correns on Mirabilis jalapa (the four o’clock 
plant)—when each made his discovery of non-Mendelian 
inheritance. Baur was studying leaf-color inheritance 
in geraniums. He began his investigation by doing self- 
fertilization experiments and found that seeds derived from 
self-fertilization of flowers on green branches produced 
plants that contained only green leaves. Seeds derived from 
self-fertilization of flowers on white branches produced 
seedlings that had only white leaves. These latter seedlings 
grew poorly and never produced mature plants. The self- 
fertilization of flowers from branches with variegated leaves 
produced a mixture of progeny that were either variegated, 
had all white leaves, or had all green leaves. 

These results led Baur to make reciprocal crosses 
between branches with different leaf colors. Using pollen 
from a flower located on a branch with one leaf color, he 
fertilized ovules from a flower located on a branch with a 
different leaf color. The results, as shown in Figure 19.1, 
were progeny that invariably exhibited the phenotype of 
the female parent in the cross. This is not the result pre- 
dicted by Mendelian genetics (which predicts no difference 
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Figure 19.1 Reciprocal crosses demonstrating maternal 
inheritance of chloroplasts. 


in the results of reciprocal crosses), nor is it the result 
expected if leaf color were inherited on a sex chromo- 
some. Instead, the outcome suggested that transmission of 
leaf color occurs through maternal inheritance—that is, 
through genes transmitted in the ovule. Leaf color in the 
geranium is controlled exclusively by maternal inheritance, 
and the male gamete (in the pollen) makes no contribution 
to the phenotype. 

White leaves are produced when leaf cells contain 
mutant chloroplasts that lack the ability to produce chlo- 
rophyll. Variegated leaves are produced by plants whose 
cells contain a mixture of normal and mutant chloroplasts. 


The green patches of variegated leaves are composed of 
cells containing chloroplasts that can produce chloro- 
phyll, whereas the white leaf patches are composed of 
cells containing mutant chloroplasts that are unable to 
produce chlorophyll. Modern-day plant biology explains 
these results as a consequence of organelle inheritance 
and states that the allelic differences reside in a gene 
in the chloroplast genome. Correns’s results with the 
four o’clock plant paralleled those obtained by Baur with 
geraniums. 

In the 1950s, several decades after Baur and Correns 
described their observations of non-Mendelian inheri- 
tance in plants, Yasutane Chiba and colleagues suggested 
that mitochondria and chloroplasts contain their own 
genomes. This assertion was based on the results of 
staining with the compound Feulgen, which specifically 
stains DNA. In studying mitochondria and chloroplasts 
from a variety of plants and animals, Chiba detected 
Feulgen-positive spots in the cytoplasm of virtually all 
cells examined, and determined that the Feulgen-stained 
cytoplasmic DNA was contained within the organelles. 
This result is consistent with the presence of chromo- 
somes in mitochondria and chloroplasts. 


Homoplasmy and Heteroplasmy 


Figure 19.1 illustrates that if an ovule is obtained from a 
flower on a branch with all green leaves, then it contains 
chloroplasts that produce chlorophyll, and its progeny 
plants will have only green leaves regardless of the leaf 
color of the pollen-producing plant. Similarly, an ovule 
obtained from an all-white-leafed branch contains mutant 
chloroplasts, and all progeny will have only white leaves 
due to the transmission of defective chloroplasts from 
the ovule. Ovules from variegated plants can produce 
progeny with green, white, or variegated leaves. This ap- 
parent departure from the maternal inheritance pattern 
for green and white leaves can be reconciled by the obser- 
vation that each plant cell contains many copies of each 
chloroplast gene. 

The amount of nuclear genetic material is constant: 
haploid cells have a single copy of each chromosome, 
and diploid cells have two copies of each chromosome. 
In contrast, the number of copies of organelle genes in 
each cell is much higher and varies significantly with both 
organism and cell type. Copy-number variation occurs 
at two levels. First, the number of organelles per cell can 
vary from one to hundreds, and second, the number of 
copies of the organelle genome per organelle also var- 
ies from one to many. Thus the terms homozygous and 
heterozygous are not applicable to alleles of genes on 
organelle genomes. Rather, a cell or organism in which 
all copies of a cytoplasmic organelle gene are the same is 
identified as homoplasmic and is said to exhibit homo- 
plasmy for that gene (Figure 19.2a). On the other hand, 
if variation exists among the copies of an organelle gene, 


652 CHAPTER 19 Organelle Inheritance and the Evolution of Organelle Genomes 


(a) Homoplasmic and heteroplasmic cells 
Nucleus Chloroplasts 


mutant E 
aN % ~- 0 


P 
© Wild type Q 


Green White Variegated 


Heteroplasmic 


Homoplasmic cells have organelles 
cells contain a 


with the same genotype. 


mixture of alleles. 


(b) In maternal inheritance, phenotype of progeny depends only 
on the genotype of the maternal parent. 


Q Parent & Parent Progeny 
lS e| x @— |a e] creen 
© ð any o = 
x | — White 
= any & 
O = E| Green 
© 
Variegated — O 2| x @) White 
0 any do 
0 


© | Variegated 


Figure 19.2 Homoplasmy and heteroplasmy in cells. 


the cell or organism is heteroplasmic and exhibits het- 
eroplasmy, carrying a mixture of alleles of an organellar 
gene. Note that in a heteroplasmic organism, some cells 
can be homoplasmic wild type, other cells homoplasmic 
mutant, and still others heteroplasmic. In cells with both 
wild-type and mutant genotypes, the wild-type allele can 
complement the mutant allele. 

Homoplasmic and heteroplasmic genotypes for chlo- 
roplast genes explain the maternal inheritance of variega- 
tion observed by Baur in geraniums (Figure 19.2b). Ovules 
derived from flowers on branches that contain green 
leaves are homoplasmic for wild-type chloroplast genes 
and transmit only wild-type chloroplasts to their progeny. 
In contrast, ovules derived from flowers on branches with 
white leaves are homoplasmic for a chloroplast mutation, 
and only mutant chloroplasts are passed to progeny. 

The progeny phenotypes derived from flowers 
on variegated branches illustrate the complexity of 


organellar genetics. Consider an ovule produced on a 
variegated branch that consists of a mixture of cells. 
Some of them are heteroplasmic, inheriting a cytoplasm 
containing many chloroplasts, some that are wild type 
and others that harbor the mutant allele. During the 
mitoses and meiosis that produce egg cells, the chloro- 
plasts are divided randomly among daughter cells. If an 
egg cell inherits both wild-type and mutant chloroplasts, 
a heteroplasmic plant with variegated leaves develops. 
However, if by chance the organelles inherited by an egg 
cell are all wild type, the branches of the plant produced 
by fertilization of the egg will be green. Alternatively, 
chance might result in an egg cell inheriting chloroplasts 
that are all mutant, in which case the plant will have 
white leaves. 


Genome Replication in Organelles 


Organelle DNA is packaged into protein-DNA com- 
plexes in an area of the organelle called the nucleoid. 
Each nucleoid usually contains multiple copies of the 
organellar genome. There may be several nucleoids per 
organelle and multiple organelles per cell, resulting in a 
copy number for organelle genomes that is in the range 
of hundreds to thousands per cell. To better understand 
the transmission of mutations in organellar genomes, and 
their phenotypic effects, let us examine how organellar 
DNA is replicated. 

A major difference between the nuclear genome 
and that of an organelle is their relationship to the cell 
cycle. Each of the nuclear chromosomes is duplicated 
once each mitotic cycle, so that daughter cells have 
exactly the same chromosome constitution as the par- 
ent cell following cell division. In contrast, the replica- 
tion of organelle genomes is not tightly coupled to the 
cell cycle. Rather, the replication of organelle genomes 
depends on three factors (Figure 19.3). First, organelle 
transmission genetics depends on the growth, division, 
and segregation of the organelles themselves (“organelle 
division” in Figure 19.3). There appears to be a mecha- 
nism to ensure that each daughter cell receives approxi- 
mately equal amounts of the organelles present in the 
mother cell. Second, the segregation of genes encoded 
in the organelle genome is connected to the division and 
segregation of nucleoids within an organelle (“nucleoid 
division” in Figure 19.3). Details of this process are still 
being discovered, but differences in the replication rate 
of nucleoids have been observed both between cells 
and between organelles. Third, organelle transmission 
genetics depends on the replication of the individual 
organelle genomes (“DNA replication” in Figure 19.3). 
There is evidence that DNA molecules within a nucleoid 
are related to each other; they are sometimes physically 
linked, which would suggest that they are products of 
DNA replication. 
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Figure 19.3 Factors in replication of organelle genomes. 


Replicative Segregation of Organelle Genomes 


The variation in the numbers of organelles and of their 
genomes in different somatic cells and tissues can sig- 
nificantly influence the phenotypic effects of mutations 
in organelle genes. Consider again the case of the varie- 
gated leaves. If a cell is homoplasmic with regard to this 
trait, cells descended from this cell by division will also 
be homoplasmic. However, cells that are heteroplas- 
mic can produce both heteroplasmic and homoplasmic 
descendants. 

To see how this happens, imagine a plant cell in which 
a mutation occurs in a chloroplast genome. Through 


segregation of nucleoids during chloroplast division, chlo- 
roplasts in which all copies of the genome harbor the 
mutations can arise. Since chloroplasts within a cell do not 
fuse with one another, once a homoplastic mutant chlo- 
roplast arises, it does not acquire wild-type genomes from 
other chloroplasts within the cell. During cell division, 
the chloroplasts are randomly distributed to the daughter 
cells. If by chance all the organelles inherited by a daugh- 
ter cell are of a single genotype, homoplasmic cells can 
be generated from a heteroplasmic ancestral cell (see the 
cells at the bottom of the far-right columns in Figure 19.4). 
This random segregation of organelles during replication 
is termed replicative segregation. Replicative segregation 
is of great importance since it affects the proportion of 
mutant organelle genomes in a cell, thus influencing the 
severity (penetrance and expressivity) of phenotypes pro- 
duced by mutations in organellar genomes. It can lead to 
genetically mosaic organisms with both “mutant” cells and 
“wild-type” cells; and, as we see with the variegated plants, 
it can influence transmission of mutant alleles to subse- 
quent generations depending on the organellar genotype 
of the germ cells. 

In heteroplasmic individuals, penetrance and expres- 
sivity will depend on the ratio between mutant and wild- 
type organelle alleles, which can vary among cells and 
tissues. In some cases, wild-type alleles can complement 
mutant alleles within an organelle, so a heteroplasmic 
individual can often tolerate a high frequency of mu- 
tant alleles without a mutant phenotype being evident 
or becoming severe. For organelle inheritance between 
generations, the number of chloroplast or mitochon- 
drial genomes present in the germ cells is important. 
In heteroplasmic individuals, transmission will depend 
on what fraction of organelle genomes present in the 
gametes contain mutant versus wild-type alleles. Due to 
replicative segregation, gametes can be produced that are 
homoplasmic wild type, homoplasmic mutant, or het- 
eroplasmic, and they can have varying ratios of mutant 
and wild-type alleles. Thus, replicative segregation can 
explain both variation in penetrance and expressivity be- 
tween individuals and also variable transmission, where 
green, white, and variegated seedlings can all be derived 
from variegated plants. 

The observation that mitochondria undergo fre- 
quent fusion and fission has implications for the segre- 
gation of mitochondrial DNA and creates the potential 
for genotypes within a cell’s mitochondria to become 
mixed and homogenized. Thus, replicative segregation 
in mitochondria is more complicated than that described 
for chloroplasts. 

Now that we have described some of the complexities 
of transmission of the organelle genomes, for the remain- 
der of the chapter we will assume that individuals are 
homoplasmic, unless there is evidence that heteroplasmy 
exists. 
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Figure 19.4 Development of 
homoplasmy from heteroplasmy 
by replicative segregation. 
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19.2 Modes of Organelle Inheritance 
Depend on the Organism 


The inheritance of organelle genomes occurs through two 
basic mechanisms. In many organisms, the transmission 
is biased to whichever gamete contributes the bulk of the 
cytoplasm to the zygote. In this case transmission can be 
either uniparental (maternal or paternal) or biparental. 
Alternatively, inheritance is genetically determined: one 
gamete’s organelles are destined to be transmitted to the 
progeny while the other gamete’s organelle contributions 
are selectively destroyed. Even in cases where one gamete 
contributes most of the cytoplasm, genetic mechanisms 
may exist to eliminate the residual organelle contribution 
from the other gamete. Thus, the two mechanisms are not 
mutually exclusive. In this section, we explore three cases 
illustrating three different inheritance patterns. 


Mitochondrial Inheritance in Mammals 


Maternal inheritance of mitochondria is the norm in 
mammals because the egg contributes all of the cytoplasm 
and the sperm contributes primarily a nucleus during 
fertilization. Maternal inheritance of the mitochondrial 
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genome in mammals has three important consequences 
that we examine in this section: 


1. Predictions of inheritance of mitochondrial muta- 
tions can be made based solely on the genotype of 
the mother. 


2. Maternal inheritance allows the maternal lineage of 
organisms to be examined specifically. 


3. Since there is no paternal contribution, phyloge- 
netic trees constructed using mitochondrial DNA 
sequences can be interpreted as maternal genealogies 
reflecting the maternal history of species. 


Mother-Child Identity of Mitochondrial DNA In 
mammals, mothers and their children of both sexes share 
identical mitochondrial DNA (mtDNA). These identical 
genetic matches are put to many practical uses. One 
of the most dramatic examples in humans is the use of 
mitochondrial DNA to find matches between grandmothers 
and grandchildren who were separated during political 
unrest in Argentina during the 1970s. An Argentinean 
military dictatorship undertook a campaign of kidnapping 
and murder of political dissidents in the early 1970s. Among 
those kidnapped were pregnant women, who were allowed 
to give birth before they were murdered. The children of 
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these women were adopted by unrelated families, and their 
identities were hidden from their biological families. 

As the political environment in Argentina became less 
repressive, a group known as Las Abuelas de la Plaza de 
Mayo (Grandmothers of the Plaza de Mayo) demanded 
an accounting of the murder of the dissidents and the 
return of the adopted children to their biological families. 
Part of the process used to identify adopted children took 
advantage of the maternal inheritance of mitochondrial 
DNA—specifically, of the fact that each grandmother had 
transmitted her mitochondria to her biological children, 
all of whom, as a result, inherited identical mitochondrial 
genes (Figure 19.5). Her daughters in turn passed the same 
mitochondrial DNA to their biological children. By this 
hereditary transmission mechanism, grandmothers and the 
children of their daughters carry identical mitochondrial 
DNA. Comparisons of mitochondrial DNA revealed exact 
matches between individual abuelas and specific children 
of the murdered women, allowing many abuelas to be 
reunited with their grandchildren, whose mothers had been 
“disappeared.” 


Mitochondrial DNA Sequences and Species Evolution 
Mitochondrial DNA sequences are used as a tool for 
deciphering the genealogical history and evolutionary 
relationships of mammalian species. Mitochondrial DNA 
is particularly well suited to such studies for two reasons. 
First, since mitochondria are strictly maternally inherited 
in mammals, there is no recombination of alleles, as there is 
with the nuclear genome. Second, some noncoding regions 
of mitochondrial genomes evolve quickly, with the result 
that many differences in mitochondrial DNA sequence 
are present even in closely related populations. This is 
particularly true for mammals, where the rate of mutation 
in the mitochondrial genome is about 10 times that of 
the nuclear genome, reflecting decreased levels of DNA 
mutation repair in mitochondria versus repair of nuclear 
DNA. Since there is little selective pressure to maintain a 
specific sequence in noncoding regions, mutations in these 
regions accumulate at a relatively steady rate. 

Once a mitochondrial mutation becomes homoplas- 
mic in the germ cells of an individual female, the muta- 
tion is transmitted to all her progeny. Therefore, maternal 


D 
un 


siid 


® b 
OO ON 
C7 Te 9 TO. T 12 


= 
w 


All children in generation II All children in generation Ill receive 
receive their mother’s mtDNA. heir maternal grandmother's mtDNA. 


Figure 19.5 Maternal inheritance of mitochondrial genes 
in mammals. 


lineages can be traced by following the mutational changes 
back in time. The mitochondrial DNA sequences in the 
present population reflect the maternal genealogy of the 
population as a whole, and construction of a phylogenetic 
tree based on these sequences should allow the identifica- 
tion of the common ancestor(s) of the species. 


Mitochondrial Eve Analyses of mitochondrial DNA 
variation in human populations provided our first view 
of our early human ancestors’ journey out of Africa. The 
regions around the Great Rift Valley of East Africa have 
been home to humans and our hominid ancestors for at 
least 4 million years. Based on the fossil record, dispersals 
from Africa have also been a regular feature throughout 
hominid evolution (see Chapter 1 Case Study). 

Genetic studies have supported a model of human 
evolution called the recent African origin (RAO) model, 
which proposes that modern humans evolved from a 
small African population that migrated out of Africa, dis- 
placing other hominid species (Figure 19.6). 

The RAO model postulates that modern humans arose 
approximately 120,000 to 200,000 years ago, whereas a 
competing model, the multiregional (MRE) model, posits a 
much older age for our species—up to 2 million years ago. 
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Figure 19.6 Human evolution. Genealogical tree of mod- 
ern humans based on phylogenetic analyses of mitochondrial 
restriction fragment length polymorphisms (RFLPs) strongly 
supports the RAO model. The population affinities of the mtDNA 
types are as follows: Western Pygmies (1, 2, 37-48); Eastern 
Pygmies (4-6, 30-32, 65-73); !Kung (7-22); African Americans 

(3, 27, 33, 35, 36, 59, 63, 100); Yorubans (24-26, 29, 51, 57, 60, 

63, 77, 78,103, 106, 107); Australian (49); Herero (34, 52-56, 105, 
127); Asians (23, 28, 58, 74, 75, 84-88, 90-93, 95, 98, 112, 113, 
121-124, 126,128); Papua New Guineans (50, 79-82, 97, 108-110, 
125, 129-135); Hadza (61, 62, 64, 83); Naron (76); and Europeans 
(89, 94, 96, 99, 101, 102, 104, 111, 114-120). 
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The RAO model suggests genetic diversity should be great- 
est in Africa since humans would have diversified there 
before migrating outward. In the RAO scenario, the genetic 
diversity outside of Africa would be a subset of that found 
in Africa and so would reflect the subpopulation of humans 
who migrated from Africa. 

Allan Wilson and colleagues used the mitochondrial 
genome to analyze genetic diversity in modern humans. 
Their phylogenetic analysis of mtDNA sequences from in- 
dividuals representative of distinct geographic regions leads 
to two major conclusions: First, Africans are genetically 
more diverse than humans from other continents (see far 
left portion of Figure 19.6), and second, the genetic diver- 
sity of non-Africans is a subset of that found in Africans. 
In addition, comparison of human sequences with those 
of chimpanzees allowed the researchers to estimate when 
the divergence of humans occurred. This is calculated by 
first working out the rate of sequence evolution in terms 
of base-pair changes per million years. The researchers di- 
vided the number of sequence differences between humans 
and chimpanzees by 5 to 7 million years (the divergence 
time of the two species) and then calculated the minimum 
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Myopathy 
Cardiomyopathy 
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MILS Maternally inherited Leigh syndrome 
PEO Progressive external ophthalmoplegia 
MERRF Myoclonus epilepsy with ragged 
red fibers 
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divergence time of humans by applying the rate of sequence 
evolution to the two most divergent human sequences. 
Such calculations led to the estimate that modern humans 
first appeared about 200,000 years ago. 

The patterns of mitochondrial DNA variation suggest 
that modern humans evolved in Africa and subsequently 
migrated around the world, largely displacing but occa- 
sionally interbreeding with other hominid populations (see 
Chapter 22). The mtDNA of all humans living today is 
descended from a female or group of females living in East 
Africa 120,000 to 200,000 years ago. The carrier of this an- 
cestral mtDNA has been called our “mitochondrial Eve.” 
See Genetic Analysis 19.1 for practice interpreting data from 
another research project that analyzed mitochondrial DNA. 


Mitochondrial Mutations and Human Genetic Disease 
Human biology is highly dependent on the cellular 
energy derived from oxidative phosphorylation reactions 
in our mitochondria. It is therefore not surprising that 
mitochondrial mutations can result in human genetic 
diseases (Figure 19.7a). The phenotypes of mitochondrial 
diseases are often highly pleiotropic, a reflection of 
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Figure 19.7 Mutations in human mitochondrial genes leading to disease syndromes. (a) Muscle 


functioning, hearing, and vision all require high levels of energy prod 


uced by mitochondria. 


(b) Pedigree showing maternal inheritance with incomplete penetrance of LHON. 


GENETIC ANALYSIS 


PROBLEM Although North American bison (Bison bison) and domestic 
cattle (Bos taurus and Bos indicus) descended from a common ancestor, 
they do not readily interbreed. However, because they still share the same 
chromosome number and structure, the production of fertile interspecific 
hybrids is possible. Male bison have been known to breed with female cat- 
tle, but not the converse. Twelve North American bison herds (numbered 


[BREAK IT = through 12 at right) were examined for evidence of 


DNA inherited in mammals? such interbreeding by a comparison of their mtDNA 
sequences with those of several cattle breeds and 
related species. A phylogenetic tree constructed from 
the comparisons is presented here. The numbers 
represent confidence values for the particular relationships (100 is the 
maximum). 

a. Explain why mtDNA but not nuclear DNA is used to detect bison- 

domestic cattle interspecific hybrids. 


b. Based on this phylogeny, identify which bison herds show evidence 
of interspecific breeding with domestic cattle. 


BREAK IT DOWN: Phylogenetic 
trees reveal relatedness and suggest 
common ancestry. 


= B. indicus Danakil 

=- B. indicus Ogaden 

B. indicus Adwa 

B. taurus Longhorn 
B. taurus Algarvia 

B. taurus Shorthorn 

- B. taurus Jersey 

B. taurus Hereford 

B. taurus Charolais 

B. taurus Criollo Chiapas 
= B, taurus Cheju Black 
= B, taurus Jutland 

= B. taurus Angus 

B. taurus Holstein 


Cattle 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem ad- 
dresses and the nature of the re- 
quested answer. 


2. Identify the critical information given 
in the problem. 


1. This problem presents a phylogenetic analysis of an mtDNA sequence in do- 
mestic cattle and in bison. We must explain why mtDNA was used rather than 
nuclear DNA, and then we must examine the phylogeny to identify bison herds 
that do and do not have bison-hybridization in their lineage. 

2. The phylogenetic tree depicts evolutionary relationships between cattle mtDNA 
and mtDNA samples from bison. 


Deduce 


3. Examine the pattern of major clades 
in the phylogenetic tree and the 
membership of each clade. 


4. Identify the kind of phylogenetic 
evidence (based on mtDNA) that 
would be consistent with interspecific 
hybridization and also the kind that 
would be inconsistent with it. 


TIP: In interspecies hybridization, bison mtDNA sequences would 
be more closely related to cattle sequences than they are to other 
bison sequences. 


3. The phylogeny has two major clades. The bottom clade contains eight North 
American bison herds (Bison bison 1 through 8) and two outside reference spe- 
cies, European bison and yak. The upper clade contains fourteen domestic cattle 
breeds (Bos taurus and Bos indicus) and four North American bison herds (Bison 
bison 9 through 12). 


4. Ifa clade consists only of domesticated breeds or only of bison, then the animals 


in the clade are more closely related to one another than they are to animals 
in other clades and do not have interspecific hybridization in their lineage. If a 
clade contains bison and domesticated cattle breeds, then there is a close rela- 
tionship between the bison and the cattle in that clade. 


Solve 


Explain why mtDNA but not nuclear 
DNA sequences were used in this 
phylogenetic analysis 


TIP: In mammals, all mitochondrial DNA is 
maternally inherited. 


6. Determine which bison are 
interspecies hybrids. 


TIP: Bison of hybrid origin will harbor mtDNA 
more closely related to that of cattle than of bison. 


For more practice, see Problem 26. 


Answer a 

5. Weare told that female cattle interbreed with male bison, but not the reverse. 
Since mtDNA is inherited maternally, the resulting hybrids would possess solely 
cattle mtDNA but would contain equal mixtures of cattle and bison nuclear 
DNA. 


Answer b 

6. Bison herds 9 to 12 are in the same clade as a number of breeds of domestic 
cattle, signifying that their mtDNA sequences are more closely related to domes- 
ticated cattle than to the wild bison and yak species. Thus these four herds have 
cattle mtDNA from interspecific hybridization in previous generations. 
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the ubiquitous dependency of cells on mitochondrial 
function. A hallmark of such diseases is their strictly 
maternal transmission. Since homoplasmic null alleles 
in mitochondrial genes would result in lethality, 
mitochondrial mutations in humans either are partial 
loss-of-function alleles (see Section 4.1) or, if null alleles, 
individuals are heteroplasmic. 

Leber hereditary optic neuropathy (LHON) is a mi- 
tochondrial genetic disease in which degeneration of the 
central optic nerve results in blindness, usually in late 
adolescence to early adulthood (Figure 19.7b). Like most 
diseases caused by mitochondrial mutations, the LHON 
syndrome is accompanied by pleiotropic defects, primar- 
ily a range of heart abnormalities. LHON can be caused 
by mutations in a number of different genes that encode 
proteins of the NADH dehydrogenase subunit involved in 
electron transport. In the pedigree shown in Figure 19.7b, 
affected individuals have a single base-pair change, result- 
ing in a missense (arginine to histidine) mutation in the 
subunit 4 gene, ND4. 

Close inspection of the pedigree in Figure 19.7b re- 
veals that, while all affected individuals have an affected 
mother, not all children of an affected mother exhibit 
LHON. If we assume strict maternal inheritance of the 
mitochondrial mutations, then the phenotype is not fully 
penetrant. There are three possible reasons for incom- 
plete penetrance: the effects of heteroplasmy, the effects 
of genetic interactions with nuclear genes, and the effect 
of environmental factors interacting with mitochon- 
drial gene mutations to produce a mutant phenotype. 


(a) (b) 


Homoplasmic segregation Primordial germ cell 
100% containing wild-type 
mutant and mutant mitochondria 
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A discussion of mitochondrial gene-environment in- 
teractions appears in the Case Study at the end of 
this chapter, and an example of mitochondrial—nuclear 
interactions appears in Experimental Insight 19.1 on 
page 669. Here we consider heteroplasmy as a cause for 
incomplete penetrance. 

Heteroplasmy can lead to incomplete penetrance of 
a human hereditary disease because, as discussed earlier, 
each cell contains multiple mitochondria and each mito- 
chondrion contains multiple copies of the mitochondrial 
genome. There is no fixed number of copies or organelle 
genomes in a cell. The numbers of organelles within a cell 
can influence expressivity, penetrance, and transmission 
of mutant alleles in various ways. The numbers of copies 
of mitochondrial genomes in human cells vary from hun- 
dreds to hundreds of thousands, depending on the cell 
type and physiological state. In cells with both wild-type 
and mutant mitochondrial genotypes, the wild-type allele 
can complement the mutant allele. 

In human pedigrees, heteroplasmic mothers can pro- 
duce wild-type homoplasmic progeny, mutant homoplas- 
mic offspring, or heteroplasmic offspring (Figure 19.8a). 
For mitochondrial transmission in mammals, the number 
of mitochondria present in the egg cell is what matters. 
Human oocytes typically have a small number (e.g., 10) of 
large mitochondria that are subsequently divided into many 
smaller mitochondria in the zygote. In humans, an egg 
cell contains up to 2000 mitochondrial genomes. In het- 
eroplasmic individuals, replicative segregation can lead to 
variable penetrance, in which the ratio of mutant : wild-type 
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Figure 19.8 Variable penetrance of mitochondrial mutations. 
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mitochondrial genomes varies significantly between differ- 
ent progeny (Figure 19.8b). 

Furthermore, replicative segregation of mitochondrial 
mutations over the lifetime of an individual can lead to vari- 
able ratios of mutant : wild-type mitochondrial genomes in 
different cells and tissues of the same heteroplasmic individ- 
ual; and this too results in variable phenotypic penetrance. 
Disease symptoms will develop only when vulnerable cells 
contain a high proportion of mutant mitochondria. For 
example, in the case of another mitochondrial disease, 
called MERRF (myoclonic epilepsy with ragged red fibers), 
an individual who displayed the mutant genotype in 85% of 
his mitochondrial DNA did not exhibit a phenotype defect, 
whereas a cousin with 96% mutant mitochondria displayed 
a severe phenotype. See Genetic Analysis 19.2 for practice 
in analyzing a pedigree for evidence of various forms of 
nuclear and mitochondrial inheritance. 


Mating Type and Chloroplast Segregation 
in Chlamydomonas 


Chlamydomonas reinhardii is a single-celled green alga 
with a haploid nuclear genome that harbors a single, large 
chloroplast containing 50 to 100 genomes divided among 
5 to 15 nucleoids. Haploid cells of Chlamydomonas also 
typically have about 50 copies of the mitochondrial ge- 
nome distributed among a small number of mitochondria 
in the germ cells and a larger number of mitochondria at 
other stages of the life cycle. 

Matings between Chlamydomonas cells of different 
mating types produce diploid algae that undergo meio- 
sis to produce haploid progeny. Mating compatibility 
is determined by the genotype at the mt locus, and mt* 
individuals mate only with mt” individuals. Both mating 
types appear to contribute equally to the cytoplasmic 
content of the diploid zygote, but in approximately 95% 
of matings, the chloroplast genome is contributed by 
the mt* mating type. In the remaining 5% of matings, 
chloroplast inheritance is biparental. The first mutation 
in a chloroplast gene discovered in Chlamydomonas was 
isolated by Ruth Sager in 1954 and confers resistance to 
the antibiotic streptomycin (str). Analogous to recipro- 
cal crosses between four o’clock flowers of different leaf 
types, reciprocal crosses between streptomycin-resistant 
and streptomycin-sensitive Chlamydomonas strains of 
different mating types give different results; the chloro- 
plast genotype is contributed primarily by the mt* parent 
(Figure 19.9). Remarkably, though the chloroplast genome 
is preferentially transmitted by the mt* mating type, mi- 
tochondria are preferentially transmitted by the mt” mat- 
ing type. The genetic mechanisms by which the different 
mating types preferentially transmit the different organel- 
lar genomes are presently unknown. 

During the mating process in Chlamydomonas, the 
two cells of opposite mating type fuse, after which the 
chloroplast genome from the mt* parent is selectively 
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Zygote _ Lygote 
N R The chloroplast = 
inherited from the 


mt* z mt* parent is mt* 
mt Gr maintained. 


The chloroplast inherited 


Meiosis from the mt” parent is Meiosis 
selectively degraded. 

‘str? ‘str! tr str? 
mt Imt mt mt 
mt str? mt str? mt str° mt str° 

mE mt” mt mt 
mt str? mt str? mt str® mt str® 

The segregation of the mating type allele produces progeny in a 
ratio of 2:2, as is typical for nuclear genes. 

The ratio of chloroplast genotypes is 4:0 because all progeny 
receive only the chloroplast contributed by the mt* parent. 


Figure 19.9 Chloroplast segregation determined by 
mating type in Chlamydomonas. 


maintained, while that from the mf parent is degraded. 
The mechanism by which the mt  cell’s chloroplast ge- 
nome is eliminated is not known, but it is likely to in- 
volve degradation of that genome at some point in the 
mating process. A similar process leads to the loss of the 
mitochondrial genomes contributed by the mt* gamete. 
Perhaps the degradation of organelles or their genomes 
provides a possible source of organelle DNA that may be 
transferred between genomes—into the nuclear genome, 
for example. (We will return to this topic later in the chap- 
ter, when we discuss the evolution of the organelles and 
their genomes.) For the cases in which biparental inheri- 
tance occurs, the presence of the two types of chloroplast 


GENETIC ANALYSIS 


PROBLEM The pedigree shows transmission of a rare human hereditary disorder. | 
a. Determine the most likely mode of inheritance. BREAK IT DOWN: In humans, inheritance ll 
A PR i <a es can be autosomal recessive or autosomal dominant, 

b. Identify any individuals in the pedigree whose X-linked recessive or X-linked dominant, or maternal. | Il al b C] m0 
phenotype is inconsistent with the expected phenotype. IP |e 3 [4 5 | 6 

c. Justify your proposed mode of inheritance by explaining the inconsistencies. lll om © e m m P5 è Ò Ò l 

TA SAS e T BETO TOT TD 

Solution Strategies Solution Steps 

Evaluate 

1. Identify the topic this problem ad- 1. This problem concerns the mode of inheritance of a hereditary abnormality in 
dresses and the nature of the re- a human pedigree. The answer requires proposing a mode of inheritance, iden- 
quested answer. tifying family members whose phenotypes are inconsistent with the proposed 

mode, and explaining those inconsistencies in a manner that justifies the pro- 
posed mode. 

2. Identify the critical information given 2. The pedigree gives the phenotype of each family member in three 
in the problem. generations. 

Deduce 

G) Identify the possible modes of in- 3. The possibilities are that the trait might be caused by the mutation of either a 

heritance of the gene causing this nuclear gene or a mitochondrial gene. If the mutated gene is nuclear, it might be 
abnormality. either recessive or dominant and either autosomal or X-linked. If the mutation is 


TIP: Human cells contain maternally inherited 
mitochondria in addition to nuclear chromosomes. 


mitochondrial, the transmission pattern will be maternal inheritance. 


4. Examine the pedigree to see whether 4. The pattern is inconsistent with X-linked recessive inheritance, in which many 


the pattern is generally consistent more males than females have the recessive phenotype. Here, the ratio of six 
with autosomal recessive or X-linked females to four males is close to 1:1, so X-linked recessive inheritance is highly 
recessive inheritance. unlikely. Autosomal inheritance is unlikely, since siblings in generation Ill are 


either all affected or none affected within families. 
5. Examine the pedigree to see whether 5. In X-linked dominant inheritance, all daughters of males with the dominant- 


the pattern is generally consistent mutation are also expected to have the trait. Il-5 does not transmit the trait to 
with X-linked dominant or autosomal any of his three daughters, thus making X-linked dominant inheritance highly 
dominant inheritance. unlikely. Autosomal dominant inheritance is possible, where Il-3 is nonpenetrant; 


but there is only a 1/32 chance (1/2°) that Il-5 would have five children who do 
not have the trait. 


6. Examine the pedigree to see whether 6. The pedigree pattern is consistent with maternal (mitochondrial) inheritance. 


the pattern is consistent with mater- Affected individuals are all offspring of affected mothers (l-2, Il-2) or of female Il-3 
nal inheritance. (who may harbor the mutant allele but does not exhibit the phenotype). 
Solve 


Answers a and b 
7. Determine the mode of transmission 7. Maternal inheritance best explains the observed segregation pattern, but there 


that is consistent with the pedigree is one inconsistency. Individual Il-3 does not show the phenotype as expected 
data. under strict application of the rules of maternal inheritance. 
I Answer c 
Explain the presence of the anoma- 8. Lack of penetrance of the phenotype (as in Il-3) may result from (1) variable pen- 
lous individuals whose phenotypes etrance owing to some individuals being heteroplasmic, since some could have a 
are inconsistent with maternal greater proportion of mutant mitochondria than others; (2) other genetic risk fac- 
inheritance. tors, such as alleles of nuclear genes (since both males and females show variable 
TIP: Heteroplasmy may occur among the TIP: Proteins produced penetrance, alleles of autosomal genes may be influencing the penetrance of the 
ear chromosomes Feet mitochondrial mutation, although common alleles of X chromosome genes can- 
produced by nuclear not be ruled out); (3) environmental factors that influence the penetrance of the 
genes. phenotype. 


For more practice, see Problems 12, 14, 17, 18, 19, 20,and 22. Visit the Study Area to access study tools. MasteringGenetics™ 
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genomes in the same organelle allows the genomes to un- 
dergo recombination that may result in the segregation of 
recombinant and parental chloroplast genomes. 


Biparental Inheritance in Saccharomyces 
cerevisiae 


Saccharomyces cerevisiae is a single-celled yeast that can 
grow either aerobically (with oxygen) or anaerobically 
(without oxygen). Mitochondria are not able to produce 
energy (ATP) when oxygen is unavailable; so under anaer- 
obic growth conditions, yeast obtain their energy from fer- 
mentation, which does not require mitochondria. Under 
aerobic conditions, however, mitochondria-mediated 
aerobic respiration allows yeast to grow faster than they 
grow by fermentation. Thus mutations that eliminate mi- 
tochondrial function in yeast do not prevent growth, but 
they do cause the mutant yeast to grow at a slower pace 
than do wild-type yeast. This dual growth capacity makes 
Saccharomyces a versatile system for studying the genetics 
of mitochondrial biology. 

In the mid-1950s, Boris Ephrussi noted that when 
grown on media that allow fermentative growth, some mu- 
tant colonies of yeast were much smaller relative to wild- 
type yeast colonies. He named these mutants petite and 
referred to the wild-type colonies as grande. Biochemical 
analyses revealed that the petite mutants are deficient in 
mitochondrial cytochrome activity and for this reason are 
unable to carry out respiratory growth. Therefore petite 
mutants are able to grow only by fermentation, and they 
grow more slowly than wild-type yeast growing by respira- 
tion. When petite mutants are transferred to media that 
permit only respiratory growth, they are unable to grow, 
and the mutations are lethal. Therefore petite mutants can 
be classified as conditional lethal mutations. 

Recall that yeast can grow as haploid cells (see 
Chapter 3). Their mating involves the fusion of two cells 
of different mating types, called a and a, to produce a 
diploid zygote. The diploid zygote can divide by mitosis 
for several generations, during which time its phenotype 
(petite or wild type) can be identified. When the zygote 
undergoes meiosis, four haploid progeny (ascospores) con- 
tained within an ascus are produced. Tetrad analysis can be 
performed on the ascospores to determine the segregation 
of alleles (see Section 5.7). Mutations in nuclear genes will 
segregate in a 2:2 ratio (mutant : wild type) when mutant 
lines are mated with wild type (Figure 19.10a). Both a and a 
gametes contribute mitochondrial genomes to the zygote. 

Genetic analysis of petite mutants reveals that they 
fall into three distinct classes. One class, called nuclear, 
or segregational, petites (designated pet’), segregate 2:2 
when crossed with the wild type (Figure 19.10b); pet” are 
mutations in nuclear genes. The existence of nuclear 
petites demonstrates that the functioning of the mito- 
chondria depends not only on its own genome but also 
on genes contained in the nuclear genome. Both genomes 
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Mutations in nuclear genes exhibit 2:2 segregation. 
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Progeny of petite and wild-type phenotypes are produced in a 2:2 ratio, 
indicating that segregational petite mutations are in nuclear genes. 


(c) Wild type Neutral 


petite 


All wild-type progeny 


Progeny do not exhibit the petite phenotype, indicating that neutral 
petite mutants are not transmitted. Examination of neutral petite mutants 
indicates that they lack most or all mitochondrial DNA . 


(d) Wild type Suppressive 


petite 


All petite progeny 


Petite mitochondrial DNA dominates, and all progeny exhibit the petite 
phenotype. Examination of suppressive petite mutants indicates that they 
have deletions of only portions of their mitochondrial DNA. 


Figure 19.10 Transmission of petite phenotypes in 
Saccharomyces cerevisiae. 
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encode genes whose products function in the organelle, as 
we discuss in a later section. 

The other two classes of petite mutations—neutral 
petites and suppressive petites—do not show Mendelian 
inheritance and are the result of mutations in the mito- 
chondrial genome. When neutral petites are crossed with 
wild-type yeast, the diploid zygote grows normally, and 
the tetrads contain only wild-type spores (Figure 19.10c). 
These are called “neutral” because the petite phenotype is 
lost after the initial mating with wild type. Examination of 
neutral petite mutants reveals that they lack virtually all 
mitochondrial DNA, and thus they obviously lack proper 
mitochondrial function. When neutral petites are mated 
to wild-type Saccharomyces, essentially all mitochondrial 
DNA is derived from the wild-type parent, resulting in 
phenotypically wild-type progeny. 

When suppressive petites are crossed with wild-type 
yeast, the diploid zygote has respiratory properties inter- 
mediate between those of the petite and wild type. If the 
diploid zygotes are grown mitotically for several divisions, 
the diploids tend to become petite in phenotype, and the 
tetrads contain all petite spores (Figure 19.10d). Thus the 
suppressive petite phenotype suppresses the wild-type 
phenotype, resulting in progeny that are all deficient in 
respiration. Analysis of the mitochondrial genome reveals 
that initially, suppressive petites have small deletions of 
mitochondrial DNA; but upon further growth, all copies 
of the mitochondrial DNA tend to become rearranged 
and duplicated. These gross defects in mitochondrial 
DNA lead to losses and disruptions of mitochondrial 
genes and to deficiencies in aerobic respiration. 

Why do the mitochondria inherited from the sup- 
pressive petite parent overwhelm those of the wild-type 
parent? Two non-mutually exclusive possibilities are that 
(1) suppressive petite mitochondria replicate faster than 
wild-type mitochondria, perhaps due to having additional 
copies of a replication origin, and (2) the suppressive petite 
and wild-type mitochondria fuse, and the genomic rear- 
rangements present in the suppressive petite mitochon- 
drial genome induce rearrangements in the mitochondrial 
genomes inherited from the wild-type parent. The latter 
hypothesis has gained support from the observation that 
mitochondria within a cell often interact and fuse into a 
continuous mitochondrial network. 


Summary of Organelle Inheritance 


There are four primary modes of inheritance of organelle 
genes. Three of the modes are uniparental the organelles 
are contributed primarily by a single parent—as in (1) 
the maternal inheritance of organelles in mammals and 
many flowering plants; (2) the paternal inheritance of 
organelles, which is seen in gymnosperms; and (3) selec- 
tive degradation or silencing of organelle DNA during 
mating, as in Chlamydomonas. The fourth mode of in- 
heritance is biparental; both parents contribute organelles 
and their genomes to the progeny, as in Saccharomyces. 


As we learned in Section 19.1, mitochondria and 
chloroplasts contain their own genomes, composed of 
genes that are unique to the organelles and are expressed 
and replicated by mechanisms independent of those 
working on nuclear genes. The discussions that follow 
explore the structure, replication, function, and evolution 
of mitochondrial and chloroplast genomes. 


19.3 Mitochondria Are the Energy 
Factories of Eukaryotic Cells 


Enzymatically driven phosphorylation that transfers phos- 
phates from adenosine triphosphate (ATP) to other mole- 
cules provides energy used by cells for many processes and 
functions. In most eukaryotes, mitochondria are the sites 
of energy production, where electron transport is coupled 
to oxidative phosphorylation to generate ATP. In many 
species, mitochondrial genes also participate in other met- 
abolic processes and biochemical reactions, including ion 
homeostasis and biosynthetic pathways. The protein com- 
plexes that produce ATP are composed of gene products 
encoded by both the mitochondrial and nuclear genomes. 
Thus, the synthesis and regulation of the protein com- 
plexes responsible for oxidative phosphorylation and other 
mitochondrial processes depend on coordination between 
the mitochondrial and nuclear genomes. 

The general structure of a mitochondrion can 
be described as two membranes surrounding a matrix 
(Figure 19.11). The enzyme complexes responsible for 
oxidative phosphorylation are found on the inner mem- 
brane. The mitochondrial matrix is the site of mito- 
chondrial genome transcription, translation, and DNA 
replication. The mitochondrial genome is responsible 
for only a fraction of the genes needed to carry out these 
processes, however, and most of the proteins active 
in mitochondrial DNA replication, transcription, and 
translation are encoded in the nuclear genome. 

Following their translation, nuclear-encoded mi- 
tochondrial proteins are transported into mitochondria. 
Examination of the mitochondrial genomes of different 
species reveals enormous diversity as to whether specific 
proteins are mitochondrial- or nuclear-encoded; only a 
few proteins are consistently encoded by the mitochon- 
drial genome. This suggests that genes have moved from 
the mitochondrial genome to the nuclear genome at differ- 
ent times during evolution. 


Mitochondrial Genome Structure and Gene 
Content 


Genetic mapping studies and direct observation of mi- 
tochondrial chromosomes by electron microscopy in- 
dicate the chromosomes often have a circular structure 
(Figure 19.12). There is evidence, however, that circular 
mitochondrial genomes can assume a linear form and 
that the mitochondrial genomes of certain species are 
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Outer membrane 


Enzymes responsible for 
oxidative phosphorylation 
reside on the inner 
membrane. 


Intermembrane 
space 


Inner membrane 


Matrix 


Reactions of the Krebs 
cycle occur in the matrix, 
as do several other 
biosynthetic pathways. 


Ribosomal RNA and a few proteins (blue) are always encoded 
by the mitochondrial genome, other products (purple) are 
always encoded by the nuclear genome, and still others 
(orange) may be encoded by either genome depending on 
the species. 


Figure 19.11 Mitochondrial structure and function. 


primarily linear. In the vast majority of species, the (a) 
mitochondrial genome is a single molecule; but in a 
few species, the genome consists of more than one mol- 
ecule. Thus, in some species, the mitochondrial genome 


Tetrahymena mtDNA 


consists of one (Tetrahymena) or more (Amoebidium) PLAS e 

linear molecules that have terminal repeat sequences, == 

which are reminiscent of telomeres. = 
Unlike the DNA in the nucleus, mitochondrial DNA = 

is not packaged in chromatin composed of histones. ©) = 

Rather, the genomes are anchored to the inner membrane © © = 

of the mitochondria, in a manner similar to that of bacte- Human mtDNA Spizellomyces mtDNA Amoebidium mtDNA 


rial chromosomes. These and other features described be- 
low give clues to the evolutionary origin of mitochondria, 


as we discuss further in a later part of this chapter. Pisces E a Contin e Boe gees 4 


The gene content and size of mitochondrial genomes 
vary substantially among eukaryotes (Figure 19.13a). 
Known genome sizes range from a low of 6 kb in the ma- 
larial parasite Plasmodium to hundreds or thousands of 
kilobases in flowering plants. However, as with nuclear ge- 
nomes, the size in kilobases does not necessarily correlate 
with the number of genes. For example, the Saccharomyces ; 
mitochondrial genome is approximately five times as large sis ESS r sees ee 
as the human mitochondrial genome, but it contains only FIA 
a few more genes. This is because much of the extra DNA, 
including some introns, is noncoding. In contrast to their 
nuclear genomes, mammalian mitochondrial genomes 
are particularly compact and have no introns and little 
noncoding DNA. Known gene numbers in mitochondrial Figure 19.12 Genome structures of mitochondria. 
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Figure 19.13 Gene content of mitochondrial genomes. 


genomes vary from a low of 6 in Plasmodium to a high 
of nearly 100 genes in certain jakobid flagellates such as 
Reclinomonas americana (Figure 19.13b). 

As we discuss in a later section, all mitochondrial ge- 
nomes are descended from a common bacterial ancestral 


genome that likely possessed thousands of genes. The dif- 
ferences between mitochondrial genomes in living organ- 
isms reflect differential losses of genes from the ancestral 
genome in the different lineages. Gene losses in parasites 
such as Plasmodium, which obtains its energy from its 
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hosts, are often extreme, owing to loss of the genes encod- 
ing proteins required for oxidative phosphorylation. 


Mitochondrial Transcription and Translation 


The mitochondrial genome is transcribed by an RNA poly- 
merase similar to that found in bacteria (see Section 8.2). 
In some species, the mitochondrial RNA polymerase is 
encoded by a mitochondrial gene; in other species, it is 
encoded by a nuclear gene. Transcriptional regulation of 
mitochondrial gene expression also varies among species 
but in most cases has features reminiscent of bacterial 
operons. For example, transcription of the mammalian 
mitochondrial genome involves the production of just 
three polycistronic mRNA transcripts from only three 
promoters (Figure 19.14). All promoters are within the 
mitochondrial control region, and transcription is pro- 
moted in both directions, with the result that each strand 
of DNA is transcribed. Transcription of the two strands 
generates precursor RNA molecules encompassing the 
entire circumference of the mitochondrial genome that 
encode both RNAs and proteins. The rRNAs and mRNAs 
are flanked by tRNAs, which are cleaved from the precur- 
sor RNAs, thus releasing the rRNA and mRNA molecules. 

Mitochondrial translation occurs on ribosomes 
that resemble bacterial ribosomes (see Section 9.2). The 
rRNAs utilized in mitochondria are always encoded by 
the mitochondrial genome, but the mitochondrial ribo- 
somal proteins may be encoded by either the mitochon- 
drial or nuclear genome. In Reclinomonas americana, 
Shine-Dalgarno sequences are present upstream of most 
protein-coding genes, but such sequences are not evident 
in the mitochondrial genes of most eukaryotes. 


©2006 Elsevier 
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Most mitochondrial genomes encode many fewer 
than the 61 different tRNA genes that are theoretically re- 
quired for translation of all codons. Recall that the genetic 
code contains 64 codons, of which 61 encode amino acids 
during translation. Each codon can be uniquely recog- 
nized by a complementary anticodon sequence in tRNA, 
but third-base wobble and the redundancy of the genetic 
code permit genomes to carry fewer than 61 unique tRNA 
genes. Consequently, only 32 different tRNA anticodon 
sequences (i.e., 32 different tRNA genes) are required to 
recognize the 61 codons. 

The substantially lower number of unique tRNA 
genes in mitochondrial genomes compared to the number 
of codons is accommodated in different ways in the mito- 
chondria of different species. In mammalian mitochon- 
dria, the rules of third-base wobble are more lenient than 
they are for nuclear genes. Certain mammalian tRNAs 
can read codons with any of the four bases in the third 
position, a system that reduces the number of different 
tRNA genes needed in mammalian mitochondria to 22. 

In some mammalian species, not all mitochondrial 
tRNAs are encoded in the mitochondrial genome; in- 
stead, some nuclear-encoded tRNAs are imported into 
mitochondria. In extreme cases, such as Plasmodium, all 
tRNAs have to be imported since none are encoded in the 
mitochondrial genome. In addition to mechanisms that re- 
duce the total number of different tRNA genes encoded in 
mitochondria, there are differences between the mitochon- 
drial genetic codes of certain animals, plants, and fungi 
(Table 19.1). In many species, the mitochondrial genetic 
code is the same as the universal code, thus supporting 
the hypothesis that most of the changes listed in Table 
19.1 occurred relatively late in the evolution of the major 
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Table 19.1 


Examples of Differences in Mitochondrial Genetic Codes 


Codon Universal Mitochondrial 
Vertebrate Echinoderms Saccharomyces Chondrus Land Plants Ciliates 
(Yeast) (Red Algae) 

UGA Stop Trp Trp Trp Trp — Trp 
AUA lle Met = Met = = = 
CUN Leu — — Thr — — — 
AGG, AGA Arg Ser/Stop Ser — — = — 
CGN Arg — — — — — — 


N, any of the four bases A, G, U, C; —, no change from the universal code. 


branches of eukaryotes. Some of the same differences have 
apparently evolved independently in multiple mitochon- 
drial lineages, suggesting that certain changes may confer 
a selective advantage. It may be that the reduction in tRNA 
gene number in the mitochondrial genome is related to the 
relaxed evolution of the mitochondrial genetic code. 


19.4 Chloroplasts Are the Sites 
of Photosynthesis 


Chloroplasts—present in green plants, their algal rela- 
tives, and many other taxa that carry out photosynthesis— 
are only the most familiar of various organelles derived 


Outer membrane l 


Inner membrane 


Thylakoids 


Stroma 


Figure 19.15 Chloroplast structure and function. 


from a precursor organelle called a plastid. In the green 
tissues of plants, plastids differentiate into chloroplasts 
in response to light; but in nongreen tissues, plastids may 
differentiate into other types of specialized organelles. For 
example, tomatoes get their red color from pigments in a 
plastid derivative called a chromoplast. Regardless of type, 
all plastids and their derivatives possess a genome. 
Chloroplasts resemble mitochondria in being en- 
closed by a double-membrane system (Figure 19.15). 
However, chloroplasts also possess a third membrane 
system, the thylakoid membranes. These membranes re- 
side in the stroma, the region equivalent to the matrix of 
the mitochondrion. The protein complexes that carry out 
photosynthetic reactions are embedded in the thylakoid 


Chloroplast-encoded (green) and nuclear- 
encoded (orange) thylakoid membrane 
proteins responsible for converting light 
energy to chemical energy in Arabidopsis. 


membranes. As with mitochondria, most chloroplast pro- 
teins are encoded in the nuclear genome but are pro- 
duced and regulated through interactions between the 
two genomes (plastid and nuclear). 


Chloroplast Genome Structure and Gene 
Content 


Many structural features of chloroplast genomes are simi- 
lar to those of bacterial and mitochondrial genomes. For 
example, the chloroplast genome is anchored to the inner 
chloroplast membrane, and chloroplast genomes are not 
packaged in chromatin composed of histones. Like mi- 
tochondrial genomes, chloroplast genomes are generally 
found to be circular, on the basis of genetic and molecular 
mapping as well as direct observation with the electron 
microscope. However, there is evidence that linear chlo- 
roplast genomes may also occur. The similarity of chloro- 
plast genomes and bacterial genomes reflects the ancestral 
evolutionary relationship that we explore in Section 19.5. 
Compared to mitochondrial genomes, chloroplast ge- 
nomes are structurally less diverse. Chloroplast genomes 
range in size from 120 to 200 kb and usually encode 100 
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to 250 genes; the precise gene content varies between spe- 
cies. The chloroplast genome of Marchantia polymorpha 
is typical of many (Figure 19.16). While chloroplast ribo- 
somal proteins may be encoded by either the chloroplast 
or nuclear genome, the rRNA is always encoded by the 
chloroplast genome, and the tRNA molecules are usually 
encoded by the chloroplast genome. Most of the remaining 
chloroplast genes with known functions encode proteins 
involved in photosynthesis. 

One of the photosynthetic genes in the chloro- 
plast genome encodes the large subunit of ribulose-1,- 
5-bisphosphate carboxylase oxygenase, the enzyme 
responsible for the fixation of carbon from CO,. The 
enzyme, often abbreviated RuBisCO, represents up to 50% 
of the protein content of green plants and is thus possibly 
the most abundant protein on the planet. RuBisCO is 
composed of two protein subunits, abbreviated rbcL and 
rbcS, for the large and small subunit, respectively. While 
rbcL is encoded in the chloroplast genome (Figure 19.16b), 
rbcS is encoded in the nuclear genome, providing another 
example of the extensive coordination between the two 
genomes, which in this case must cooperate to produce 
appropriate quantities of the two subunits. 


Marchantia polymorpha 
cpDNA 
121,000 bp 


psbC ©1987 Elsevier 


Figure 19.16 Chloroplast genome of Marchantia polymorpha, a common liverwort. 
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Chloroplast Transcription and Translation 


Transcription and translation of chloroplast genes are 
similar to those of bacteria. Many chloroplast genes are 
arranged in operons and as a result are coordinately 
transcribed. The RNA polymerase resembles that found 
in bacteria and, as in bacteria, recognizes consensus se- 
quences (similar to those of bacterial promoters) at —1 0 
and —35 of chloroplast gene promoters (see Section 8.2). 
Like bacterial mRNAs, chloroplast mRNAs are neither 
capped at their 5’ end nor polyadenylated at their 3’ 
end. However, some RNA processing occurs, such as the 
removal of introns from a few genes and RNA editing in 
most land plants (a process described in more detail later). 
The ribosomes of chloroplasts are also similar to 
those of bacteria. For example, ribosome function is dis- 
rupted by aminoglycoside antibiotics, which also inhibit 
bacterial ribosome function. From 30 to 35 different 
tRNAs are usually encoded by the chloroplast genome, 
and as a result all codons can be translated without the 
additional wobble found in mitochondria. The kinds of 
deviations from the universal genetic code that are seen 
in mitochondrial genes are not observed in chloroplasts. 


Editing of Chloroplast mRNA 


RNA editing is the process of altering the sequence of an 
RNA molecule after transcription from the DNA genome 
(see Section 8.4). RNA editing was first discovered in the 
mitochondria of trypanosomes, where insertion (or, less 
frequently, deletion) of U residues occurs in mitochondrial 
mRNAs. The mechanism by which this editing process 
occurs (described in Chapter 8) involves complementary 
guide RNAs that are encoded in the mitochondrial ge- 
nome. The guide RNAs provide a template on which the 
changes to the target mRNA are made; there, enzymes 
either add or delete U residues from the mRNA. 

RNA editing has also been noted in the mitochondria 
and chloroplasts of land plants, where the editing process 
results in C-to-U (or, less frequently, U-to-c) changes 
in organellar mRNAs. In contrast to the RNA editing to 
insert and delete bases, the RNA editing in the organelles 
of plants does not utilize a guide RNA. Rather, C-to-U 
editing is performed by an enzyme, C deaminase, which 
converts the C to a U, while U-to-c editing is presumably 
performed by the reverse reaction, the addition of an 
amine group to the U. Proper RNA editing in these cases 
requires the presence of specific sequences adjacent to the 
sites to be edited, suggesting that the adjacent sequences 
represent binding sites for trans-acting proteins. 

Not surprisingly, given that the mRNAs of several 
genes encoding proteins involved in photosynthesis are 
edited, genetic screens designed to identify mutants 
in which photosynthesis is compromised have identi- 
fied nuclear genes controlling chloroplast RNA editing. 
For example, mutations in the nuclear CCR4 gene of 
Arabidopsis result in a loss of U-to-C editing of one 
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ACG triplet in the position of the 
translational initiation site. 
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Figure 19.17 A model for C-to-U RNA editing. 


nucleotide in the ndhD mRNA within chloroplasts; this 
editing normally generates a start codon, AUG, from the 
ACG encoded in the chloroplast genome (Figure 19.17). 

CCR4 encodes a member of the pentatricopeptide re- 
peat (PPR) family of proteins. These proteins are thought 
to play diverse roles in RNA processing, including cleav- 
age of RNA precursor molecules. Surprisingly, the other 
four edited sites in ndhD RNA are edited correctly in ccr4 
mutants, leading to the idea that each site may be edited 
by a different trans-acting protein. The nuclear genomes 
of land plants encode large numbers of PPR genes, and 
there is a strong correlation between the number of 
nuclear-encoded PPR proteins and the extent of organel- 
lar RNA editing. It appears that each edited site in organ- 
ellar RNA is processed by a distinct PPR protein! Studies 
in plant mitochondria have also identified PPR proteins 
as important components of RNA processing; in so doing, 
these studies have illuminated the mechanism of cyto- 
plasmic male sterility, a phenotype used in plant breeding 
that is described in Experimental Insight 19.1. 


19.5 The Endosymbiosis Theory 
Explains Mitochondrial and Chloroplast 
Evolution 


Endosymbiosis is a symbiotic (interdependent, often mu- 
tually beneficial) relationship between organisms in which 
one organism inhabits the body of the other. Several lines 
of evidence indicate that the mitochondria and chloro- 
plasts inhabiting modern animal and plant cells are the 
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Experimental Insight 19.1 


Cytoplasmic Male Sterility in Flowering Plants 


You probably do not think of sterility as a useful trait in a crop 
plant; however, male sterility in one parent plant provides an 
efficient mechanism for producing hybrid seed. This is pos- 
sible because the male sterile plant can act as the female par- 
ent in a cross with a second variety. In a phenomenon called 
hybrid vigor, plants that are the progeny of crosses between 
two different varieties often exhibit higher yield than do 
either of the parents. Here we describe how hybrid seed can 
be produced by taking advantage of genetic interactions be- 
tween specific nuclear and chloroplast genes. 

In plants, male sterility is a failure to produce viable pollen. 
Some cases, called cytoplasmic male sterility (CMS), are mater- 
nally inherited and are due to mutations in the mitochondrial 
genome. However, the phenotypic defects of these mito- 
chondrial mutations can often be suppressed by the presence 
of dominant alleles of nuclear genes, called Restorer of fertil- 
ity, or RF, genes. The interaction between typical CMS and 
RF genes provides an example of how genetic interactions 
between nuclear and mitochondrial genotypes can influence 
phenotypes. It can be outlined as follows: 


Female x Pollen Progeny Progeny 

Parent Parent Genotype Phenotype 
CMS rf/rf N rf/rf CMS rf/rf Male sterile 
CMS rf/rf N Rf/RF CMS Rf/rf Male fertile 


CMS = male sterile cytoplasm; N = wild-type cytoplasm; Rf= dominant 
nuclear RF allele; rf= recessive nuclear RF allele. 


In this system, CMS cytoplasm in an rf/rf background makes 
a male sterile, but a dominant RF allele, Rf, is sufficient to restore 
fertility. Many different CMS mutants are known, and they ex- 
hibit exclusive relationships with particular nuclear RF genes, 
thus indicating several distinct nuclear-mitochondrial genome 
interactions. The RF loci may act either sporophytically, in which 
case all pollen produced from Rf/rf plants is fertile, or gameto- 
phytically, in which case only half of the pollen produced by a 
heterozygote is viable. Since most plants produce a vast excess 
of pollen, these latter plants are considered male fertile. 

CMS mitochondrial genes (MG in the figure @) usually 
have novel open reading frames (ORFs) that combine se- 
quences of unknown origin with mitochondrial gene-coding 
sequences. Expression of the novel ORFs is driven by adjacent 
mitochondrial promoter sequences @. Since most plants har- 
boring CMS-causing ORFs have a full complement of normal 
mitochondrial genes, the CMS ORFs can be considered gain-of- 
function mutations. 

Several RF genes encode proteins of the pentatricopeptide 
repeat (PPR) family. The functions of characterized PPR pro- 
teins include RNA processing, such as cleavage of RNA precur- 
sors and RNA editing. This discovery is consistent with the 
effects of RF genes on CMS genes, since in the presence of a 
restorer allele, transcripts of CMS ORFs fail to accumulate. One 
current hypothesis is that PPR proteins encoded by Af alleles 
process transcripts produced by the CMS genes, thus restor- 
ing wild-type function to the affecting mitochondrial genes 
(O; see Figure 19.17). 
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CMS-RF systems have been harnessed to facilitate the pro- 
duction of hybrid seeds. The following double-cross hybrid 
scheme in maize utilizes four breeding lines as parents. 


g Inbred A x Inbred B 9 3 Inbred C x Inbred DP 


rf/rf rf/rf Rf/Rf rf/rf 
normal CMS (male sterile) normal CMS (male sterile) 


| | 


Single-cross progeny Single-cross progeny 
rf/rf Rf/rf 


CMS (male sterile) CMS 
ọ X S 
(A xB) (C x D) 


Double-cross hybrid seed 
(planted by farmer) 


1 Rf/rf CMS 
3 rf/rf CMS (male sterile) 


The hybrid seed is 4 male fertile and 3 male sterile. When plants of both 
genotypes are planted together, pollen from the male fertile plants 
pollinate both kinds. 


To produce each new generation of seeds for planting, 
breeders combine CMS and RF alleles so as to prevent female 
parents from self-fertilizing and to ensure that male parents have 
fertile pollen. In the first generation, two pairs of inbred parents 
are crossed, A x B and C x D. Both A and C have normal cyto- 
plasm but differ at the RF locus: A is homozygous recessive (rf/rf), 
and C is homozygous dominant (Rf/Rf). In contrast, lines B and D 
are CMS and rf/rf. The progeny produced by A x B are CMS rf/rf, 
male sterile, and can be used as the female parents in the subse- 
quent cross. The progeny produced by C x D are CMS Rf/rf, male 
fertile, and can be used as the male parents. The seeds that ulti- 
mately result have genomes derived from four different inbred 
lines and develop into larger, hardier plants due to hybrid vigor. 
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descendants of formerly free-living bacteria that took part 
in ancient infections of eukaryotic cells. These ancient in- 
vaders established endosymbiotic relationships with their 
hosts and have evolved along with their hosts to produce 
the diversity we observe in organelles today. In this dis- 
cussion we explore the principal lines of evidence sup- 
porting the endosymbiosis theory of mitochondria and 
chloroplast evolution, including the following evidence: 


The double-membrane system found in both organ- 
elles is derived from a similar membrane system found 
in bacteria. 


The organelles are similar in size to extant bacteria. 


Organelle DNA is packaged in a manner similar to the 
packaging of chromosomes in bacteria and dissimilar 
to that of DNA in the nuclear genome. 


The transcriptional and translational machinery of the 
organelles closely resembles that of bacteria. 


The protein-coding sequences of organelle genes are 
more like those of bacteria than like either the nuclear 
genes of eukaryotes or the sequences of archaea. 


Separate Evolution of Mitochondria and 
Chloroplasts 


The available genetic evidence indicates that mitochondria 
are monophyletic; that is, all mitochondria are descendants 
from a single common ancestor. Coupled with evidence 
that mitochondria bear strong similarities to bacteria, this 
finding suggests that the point of origin of all mitochondria 
was a single endosymbiotic event (Figure 19.18). 

Based on the fossil record, the minimum age of the 
eukaryotes is approximately 1.5 to 2 billion years. One 
hypothesis concerning the origin of eukaryotes is that 
they evolved from an anaerobic ancestor that acquired 
an aerobic endosymbiont (the mitochondrial ancestor). 
This event was perhaps linked with the global rise in at- 
mospheric oxygen that began about 2 billion years ago 
and that could have provided a selective environment for 
aerobic organisms. Based on similarity in gene sequences, 
the closest extant relatives of mitochondria are free-living 
a-proteobacteria. Extant a-proteobacteria have genomes 
of 4 to 9 Mb of DNA encoding 4000 to 9000 genes, so it 
appears that extensive gene loss has characterized the 
evolution of mitochondrial genomes. 

Chloroplasts are also monophyletic, having descended 
from a single endosymbiotic event that occurred, accord- 
ing to the fossil record, at least 1.2 billion years ago (see 
Figure 19.18). Based on similarity of gene sequences, the 
closest extant relatives of chloroplasts are free-living cya- 
nobacteria. Existing cyanobacteria have genomes of 1.6 
to 9.0 Mb of DNA encoding 1900 to 7400 genes, implying 
extensive gene loss in the evolution of the chloroplast ge- 
nome as well. Phylogenetic evidence also suggests multiple 
secondary symbioses (discussed at the end of this sec- 
tion) in which some eukaryotes acquired a photosynthetic 


eukaryotic symbiont (see Figure 19.18). These events 
resulted in the horizontal transmission of chloroplasts 
among unrelated eukaryotic lineages. 

Two fundamental questions arise when we consider 
the genomes of the organelles. First, given that mitochon- 
drial and chloroplast genomes contain from 6 to 100 and 
from 20 to 200 genes, respectively, what happened to all 
the other genes of the ancestral symbiont? Second, given 
that the organelles contain many more organellar pro- 
teins than genes, what is the origin of the nuclear genes 
that encode so many organellar proteins? Are those nu- 
clear genes derived from the ancestral symbiont genome, 
or did they evolve in the host genome? A possible answer 
was provided by the discovery that DNA is transferred 
from organelle genomes to nuclear genomes; this led to 
the hypothesis that genes have been relocated from the 
ancestral endosymbiont genome to the nuclear genome 
during evolution. 


Continual DNA Transfer from Organelles 


The nuclear genomes of eukaryotes bear evidence of 
both ancient and recent DNA transfer between the or- 
ganellar and nuclear genomes (Figure 19.19). Ancient 
transfer events can be detected by comparative genomics 
of mitochondrial genomes and by comparing eukaryotic 
nuclear genomes with bacterial genomes. Sequencing of 
eukaryotic genomes has also revealed evidence of recent 
transfers. Transferred sequences that are highly similar 
must have been transferred recently. 

Ancient gene transfers can be identified in com- 
parisons between nuclear genomes of eukaryotes and the 
genomes of extant a-proteobacteria and cyanobacteria. 
Nuclear genes that are most similar to the genes of the liv- 
ing bacterial species are likely to have been derived from 
the bacterial endosymbiont. Ancient transfers have been 
detected by comparing the Arabidopsis nuclear genome 
and genomes of three cyanobacteria, leading to the identi- 
fication of approximately 4300 Arabidopsis nuclear genes 
with a cyanobacterial origin. Thus, more than 10% of the 
Arabidopsis nuclear genome represents an acquisition 
of genetic information originally residing in the genome 
of the chloroplast (Figure 19.20). Similarly, comparisons 
between several eukaryotic nuclear genomes and those 
of a-proteobacteria detected at least 630 nuclear genes 
derived from the a-proteobacteria endosymbiont that 
gave rise to the mitochondrion. Thus, concomitant with 
the reduction in the organellar genomes is an increase in 
gene content in the nuclear genome. The importance of 
this enormous amount of additional genetic information 
in the evolution of the eukaryotic lineage is difficult to 
overestimate (see Figures 19.19 and 19.20). 

One surprise discovered through the analysis of eu- 
karyotic genome sequences is that recent transfers of 
mitochondrial and chloroplast sequences seem to be 
included in all nuclear genomes. Mitochondrial DNA 
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Figure 19.18 The evolutionary history of the mitochondrion and the chloroplast. 


sequences of recent origin found in the nucleus have been 
termed nuclear mitochondrial sequences (NUMTS), 
while nuclear sequences recently derived from plastid 
genomes are called nuclear plastid sequences (NUPTS). 
Organellar DNA sequence has been found in the nu- 
clear genome of every organism examined. NUMTS 
and NUPTS are common in many plant species; the 
Arabidopsis genome contains 17 NUPTS, totaling 11 kb, 


and 14 NUMTS, one of which is 620 kb and represents 
almost two entire mitochondrial genomes. The human 
genome contains hundreds of NUMTS, ranging from 106 
to 14,654 bp long (the latter being 90% of the length of the 
mitochondrial genome). 

Three conclusions have been drawn from the study 
of NUMTS and NUPTS. First, given the level of sequence 
similarity between NUMTS or NUPTS and the respective 


672 


CHAPTER 19 Organelle Inheritance and the Evolution of Organelle Genomes 


Transfer of genetic material from organelles to 
nucleus and between organelles continues in 
extant species (red and green dashed arrows). 


Proteins encoded by genes originally derived 
from endosymbiont genomes can be 
appropriated for other functions in host cell. 
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Figure 19.19 Transfer of endosymbiont genes to the nuclear genome and destinations of 


encoded protein products. 


organelle genome sequences, most are thought to repre- 
sent evolutionarily recent transfers of organelle DNA to 
the nuclear genome. Second, entire organelle genomes 
likely were transferred to the nuclear genome multiple 
times in evolutionary history. Third, the process is ongo- 
ing; DNA continues to move between the organelles and 
to the nucleus. While the rate of transfer is not known in 
most organisms, experiments to directly measure the rate 
of DNA transfer from chloroplast to nuclear genome in 
plants revealed a new integration of chloroplast DNA in 
the nuclear genome at a rate of 1 in 16,000 plants. This 
surprisingly high rate of DNA transfer between the organ- 
ellar and nuclear genomes can account for the large num- 
bers of evolutionarily recent insertions of organellar DNA 
(NUMTS and NUPTS) found in the nuclear genome of 
most organisms. While the rate of transfer has not been 
directly measured in humans, it is likely that it is high 
enough for NUMTS polymorphisms to be present in the 
human population. 

Although organelle genes are readily transferred into 
the nuclear genome, several events must occur for the 
transferred genes to be functional. Recall from Chapters 
14 and 15 that the details of gene regulation differ be- 
tween bacteria and eukaryotes. Since gene regulation 
in the organelles resembles that in bacteria, transferred 
genes must acquire sequences for proper transcriptional 
regulation in the nucleus. Researchers using an experi- 
mental system similar to the one for monitoring DNA 
transfer from chloroplast to nuclear genome in plants 
have demonstrated that transferred chloroplast genes can 
become functional nuclear genes at a frequency observ- 
able in the laboratory. In addition, as described in more 
detail later, the protein encoded by the transferred gene 


may be transported back to the organelle from which the 
gene was derived; or, alternatively, the protein may be 
directed to another cellular compartment. For the protein 
to be transported back to the organelle, an amino termi- 
nal signal sequence must be attached to it. Since signal 
sequences need only to have certain general structural 
features in order to function properly, the acquisition of 
functional signal sequences likely occurs at an appreciable 
frequency. 


Encoding of Organellar Proteins 


Organelles contain many more proteins than they encode 
in their genomes; this is an indication that most organel- 
lar proteins are encoded in the nuclear genome. For ex- 
ample, the yeast mitochondrion contains approximately 
400 proteins, but only 16 proteins are encoded in its 
mitochondrial genome. The nuclear-encoded organellar 
proteins are translated in the cytoplasm and then im- 
ported into the organelles. These organellar proteins are 
targeted to their final location by signal sequences of 15 to 
25 amino acids at the amino terminal end of the proteins. 
Different signal sequences label proteins for transport to 
different organelles and other locations, such as the outer 
membrane, intermembrane space, inner membrane, ma- 
trix, and stroma and thylakoid membrane systems. 

When the endosymbiotic theory of the origin of 
mitochondria and chloroplasts was first proposed, its 
framers predicted that proteins were always targeted to 
the cell compartment from which the genes encoding 
them were originally derived. In other words, if a protein 
was encoded by a nuclear gene that had originally been 
derived from the endosymbiont that gave rise to the 
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Figure 19.20 Evolution of genes derived from the 
cyanobacteria-like endosymbiont. 


mitochondrion, the protein would be targeted back to 
the mitochondrion. Contrary to expectations, however, 
the relationships between the endosymbiont origins of 
genes and the final destination of gene products are com- 
plex and difficult to predict. For example, in Arabidopsis, 
less than half the proteins identified as coming from 
the cyanobacterial endosymbiont are found to be tar- 
geted to the chloroplast (see Figure 19.20). Conversely, a 
number of proteins targeted to the chloroplast were not 
acquired from the cyanobacterial symbiont, but rather 
are descended from the original eukaryotic host genome. 
Similar observations have been made concerning the mi- 
tochondrion. Thus the proteins encoded by nuclear genes 
originally derived from endosymbiont genomes may be 
targeted to any location in the cell. 

While the diversity in the direction of protein trans- 
port was initially unexpected, perhaps consideration 
of the early stages of endosymbioses should have led 


scientists to expect it. When an endosymbiotic relation- 
ship was initially established, the genome of the ancestral 
mitochondrion would have been similar in size to that 
of its bacterial ancestors. If the rate of DNA transfer was 
similar to that measured today, the nuclear genome must 
have experienced a bombardment of DNA from the en- 
dosymbiont. Before the evolution of the mitochondrial 
protein-import machinery, proteins produced by genes 
transferred to the nuclear genome had to remain in the 
cytoplasm or be transported to the plasma membrane. 
Reduction in the endosymbiont genome could occur only 
after the evolution of systems able to import proteins into 
the endosymbiont. Such systems are composed of pro- 
teins encoded by genes originally derived from both the 
nuclear and endosymbiont genomes. 


The Origin of the Eukaryotic Lineage 


The tree of life is often depicted as having three major 
branches—the Bacteria, the Archaea, and the Eukarya— 
based on comparison of sequences of the rRNA genes 
(see Chapter 1). The extensive gene flow from bacterial 
endosymbionts to the nucleus, however, has resulted in 
the presence of significant numbers of “bacterial” genes in 
the nuclear genomes of eukaryotes. Given this situation, 
a simple tripartite view of life, in which three branches 
diverge from a single common ancestor, is overly simplis- 
tic. A fraction of the nuclear genome of every eukaryote is 
derived from bacterial endosymbionts, but where were all 
the remaining genes derived from? In other words, what 
was the original host of the a-proteobacterium that gave 
rise to the eukaryotes? 

Two models have been proposed to answer this ques- 
tion. In one model, the original host is a cell described 
as having a nucleus but no mitochondria and as sub- 
sequently acquiring an a-proteobacterium as an endo- 
symbiont. In this model, “eukaryotic” cells (cells having 
nuclei) existed before the endosymbiotic event, suggest- 
ing that such organisms lacking mitochondria might still 
exist. In the second model, the original host is a bacterial 
cell that acquires an a-proteobacterium as an endosym- 
biont; and subsequently, this host-endosymbiont sys- 
tem evolves other eukaryotic features, such as a nuclear 
membrane. If the latter model is correct, no intermediate 
eukaryotes lacking mitochondria should be found. 

Two recent discoveries have contributed new fuel 
to this discussion. First, eukaryotic organisms that were 
originally thought to lack mitochondria, such as Giardia 
intestinalis (which causes diarrhea when it infects the 
human intestine), are now known to have mitochondria. 
In the case of Giardia, the mitochondria are reduced to 
double-membrane-bound structures called mitosomes. 
Mitosomes lack a genome, but proteins requiring an an- 
aerobic environment to function are imported into them 
(see Figure 19.18). The nuclear genome of Giardia har- 
bors genes of mitochondrial origin; this finding indicates 
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that all portions of the mitochondrial genome were either 
transferred to the nucleus or lost. The extreme reduction 
of the mitochondrion to nothing but an anaerobic com- 
partment allowing the cell to carry out specific reactions 
is likely a consequence of Giardia’s parasitic lifestyle, 
where all of its energy is derived from a host organism. 
This finding means that all known existing eukaryotes 
harbor mitochondria or mitochondria-derived organelles. 

The second discovery concerns the nature of the 
genes in the nuclear genomes of eukaryotic organisms. 
Comparison of the complete genome sequences of the 
eukaryote Saccharomyces cerevisiae with two bacteria 
(Escherichia coli and Synechocystis 6803) and an archaea 
(Methanococcus jannaschii) revealed two general func- 
tional and evolutionary categories into which the yeast 
genes could be divided. One category of genes, called 
informational genes, encodes protein products that per- 
form informational processes in the cell such as DNA rep- 
lication, packaging of chromosomes, transcription, and 
translation. The informational genes of yeast resemble 
those found in Methanococcus, and this resemblance in- 
cludes a similarity between the histones of the yeast and 
the histone-like chromatin proteins present in Archaea 
(see Sections 8.3 and 9.2). The second category of genes, 
called operational genes, encode proteins involved in 
cellular metabolic processes, such as amino acid bio- 
synthesis, biosynthesis of cofactors, fatty acid and phos- 
pholipid biosynthesis, intermediary metabolism, energy 
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metabolism, nucleotide biosynthesis, and some regulatory 
functions. In contrast to their informational genes, most 
yeast operational genes resemble those of Bacteria. 

One scenario consistent with the apparent origins of 
informational and operational genes in yeast is that the 
original host cell of the a-proteobacterial endosymbiont 
was related to an archaeal cell (Figure 19.21). The original 
host genome would have contained both informational 
and operational genes, as would the a-proteobacterial 
endosymbiont. Over time, while both genomes retained 
their own informational genes, many endosymbiont oper- 
ational genes were transferred to the nuclear genome and 
often replaced their host functional equivalents. Unlike 
the cases of the mitochondria and chloroplasts, where 
the endosymbionts can be traced to specific lineages 
of Bacteria, the putative archaeal host is unknown and 
may have been unrelated to any specific lineage of extant 
Archaea. 


Secondary and Tertiary Endosymbioses 


The melding together of genomes did not happen only 
during the endosymbioses that formed mitochondria and 
chloroplasts. Secondary and even tertiary endosymbi- 
otic events have occurred between different lineages 
of eukaryotes, resulting in the dispersal of plastids into 
eukaryotic lineages that are distantly related (see Figure 
19.18). In secondary and tertiary endosymbioses, typically, 
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a non-photosynthetic eukaryote acquires a red or green 
algal endosymbiont. What happens to the nuclear ge- 
nome of the secondary endosymbiont when one eukary- 
ote envelops another eukaryote? Genes of the nuclear 
genome of the eukaryotic endosymbiont (the alga), whose 
products were targeted to the plastid, are translocated to 
the host nucleus in process analogous to the movement of 
genes from the organelle genomes to the primary endo- 
symbiont host nuclear genome. Thus the nuclear genome 
of the algal endosymbiont, termed the nucleomorph, 
undergoes reduction to the extent that it encodes only 
some genes for products targeted to the plastid as well 
as some genes required for the maintenance of the 
nucleomorph genome. The plastid is serviced by three 
different genomes (nuclear, nucleomorph, and plastid), 
and the nuclear genome of photosynthetic secondary 
endosymbionts is a mixture of four genomes (mitochon- 
drial, chloroplast, and two nuclear genomes). Because sec- 
ondary and tertiary endosymbioses have occurred many 
times during the evolution of eukaryotes, the mixing and 
coevolution of genomes has been instrumental in shaping 
the evolution of several lineages of life. 

The mixing and melding of genomes can sometimes 
result in biological anomalies. For example, the discovery 
of a reduced chloroplast (or apicoplast) in Plasmodium 
falciparum, the malarial parasite, came as quite a surprise 
because this is clearly not a photosynthetic organism. 
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Plasmodium resides within the phylum Apicomplexa, 
which would make it a descendant of an ancient second- 
ary endosymbiosis involving a host eukaryote and an 
endosymbiotic chloroplast-containing red alga (see Figure 
19.18). Is there a reason that Plasmodium, with its para- 
sitic lifestyle, might have retained the apicoplast and its 
accompanying genome, albeit without any genes encod- 
ing proteins involved in photosynthesis? 

One hypothesis explaining retention of the apicoplast 
in Plasmodium is based on differences in translation of 
organellar-encoded compared to nuclear-encoded genes. 
The initiator tRNA used in mitochondrial translation is a 
formylmethionyl-tRNA (tRNA™®, the same as used in 
bacteria. This special tRNA cannot be imported from the 
cytoplasm, since cytosolic translation in eukaryotes uses an 
initiator methionyl-tRNA that is not formylated. During the 
evolutionary history of Plasmodium, the gene encoding the 
enzyme that adds a formyl group to the methionyl-tRNA 
has been lost from the mitochondrial genome. It is thought 
that the tRNA™* used in mitochondria might be imported 
from the apicoplast, since the only methionyl-tRNA formyl 
transferase gene in Plasmodium is in the nuclear genome, 
and that the protein product of this gene is transported to 
the apicoplast. According to this hypothesis, the apicoplast 
may be maintained for the sole purpose of synthesizing 
tRNA™*t to be imported into the mitochondrion—a quirk 
of the evolutionary history of Plasmodium. 


Ototoxic Deafness: A Mitochondrial Gene-Environment Interaction 


Phenotypic penetrance can be affected by both genetic and 
environmental factors. In the case of genetic interactions, 
the phenotypic effects of a mutation are influenced by alleles 
at other loci. The gene products of other loci are thought 
either to exacerbate or compensate for the mutational de- 
fect, thereby altering the expressivity or penetrance of the 
phenotype. In the case of environmental interactions, certain 
conditions either mitigate or enhance the phenotypic effects, 
in essence making the mutation a conditional allele. Some 
mutations, like the one described here, are subject to both 
these kinds of interaction. In this particular example, the locus 
of the key mutation is a mitochondrial gene. 

A rare complication of the use of aminoglycoside anti- 
biotics, such as streptomycin, gentamicin, and kanamycin, is 
irreversible loss of hearing, termed ototoxic deafness. Several 
observations point to a genetic susceptibility to ototoxic deaf- 
ness. Due to pervasive use of aminoglycosides in China, it was 
reported that in a district of Shanghai, nearly 25% of all deaf 
individuals can trace their loss of hearing to the use of amino- 
glycosides. Nearly one-fourth of these patients also had rela- 
tives suffering from ototoxic deafness, suggesting a genetic 
susceptibility. In all 22 cases where genetic transmission of the 
susceptibility could be traced, inheritance was maternal, a sign 
of a mitochondrially inherited trait (Figure 19.22a). A similar 
situation was observed for 26 families in Japan. Furthermore, 


a large Arab-Israeli pedigree with maternally inherited con- 
genital (not ototoxic) deafness can be traced back through five 
generations to a common female ancestor. In this case, the 
mitochondrial mutation is thought to be homoplasmic, since 
family members are either severely deaf or have normal hear- 
ing. However, the phenotype is not completely penetrant; this 
finding suggests that another mutation, likely to be an autoso- 
mal recessive nuclear mutation, contributes to the manifesta- 
tion of the condition. 

In studies on bacteria, aminoglycosides stabilize mis- 
matched aminoacyl-tRNAs in the ribosome during transla- 
tion; this finding explains their antibiotic effects. The pres- 
ence of aminoglycosides causes a reduction in the fidelity of 
translation, leading to defective proteins. Aminoglycosides 
also have been shown to interact directly both with ribo- 
somal proteins and with the 16S rRNA of the 70S ribosome; 
and aminoglycoside-resistant bacteria have been shown to 
have point mutations in their 16S rRNA gene. Since the nor- 
mal target of aminoglycosides is the bacterial ribosome, the 
likely target of aminoglycoside ototoxicity in humans is the 
evolutionarily related mitochondrial ribosomes, and perhaps 
specifically the 12S rRNA that is homologous to the 16S rRNA 
of bacteria. 

Sequencing of the mitochondrial 12S rRNA gene in indi- 
viduals with congenital deafness in the Arab-Israeli family and 
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Figure 19.22 Genetic and environmental interactions in ototoxic deafness. 


in other individuals with ototoxic deafness revealed that they 
shared a single A-to-G mutation in their 12S rRNA genes. The 
mutation lies at the foot of a stem loop conserved in bacteria, 
plants, and mammals. Studies on bacterial ribosomes have 
shown that this region of the 16S rRNA forms part of the ami- 
noacyl site where mRNAs are decoded. Furthermore, amino- 
glycosides bind to this domain of the 16S rRNA, and bacterial 
mutants resistant to aminoglycosides map to this region of 
the 16S rRNA gene. 

Thus, the cause of the aminoglycoside-induced deafness 
is a mutation in the mitochondrial 12S rRNA gene, but three 
intriguing questions remain. First, why is deafness the pri- 
mary, and perhaps only, phenotypic defect? A characteristic 
of many mitochondrial diseases is pleiotropy due to a general 
loss of oxidative phosphorylation activity. However, in these 
cases of maternally inherited deafness or susceptibility to 
aminoglycosides, no obvious pleiotropic phenotypes are as- 
sociated with the deafness. Is the cochlea especially suscep- 
tible to a loss of mitochondrial function? Are the cochlear mi- 
tochondria especially sensitive to aminoglycosides? Second, 
what is the nature of the autosomal recessive mutation that 
acts to enhance the effect of the 12S rRNA mutation in the 


Arab-Israeli family? Could it be a nuclear-encoded ribosomal 
protein gene that interacts with the mitochondrial 12S rRNA? 
And third, if our mitochondrial ribosomes are evolutionarily 
related to bacterial ribosomes, why are humans able to utilize 
aminoglycosides as antibiotics in the first place? 

Clues to the answer of the third question have come 
from comparative studies of mitochondrial ribosome func- 
tion. The mutation causing deafness creates an extension of 
base pairing by one base in the stem loop of the mitochon- 
drial 12S rRNA, in effect making its structure more closely 
resemble the structure of the aminoglycoside-binding site 
of the bacterial 16S rRNA (Figure 19.22 b-c). Thus, in the 
2 or so billion years since the separation of bacteria and 
mitochondria, the structure of the mitochondrial ribosome 
has changed just enough so that aminoglycosides do not 
normally interfere with the fidelity of translation in mito- 
chondria; but mutations that result in a more bacteria-like 
ribosome structure bring back the ancient sensitivity to 
aminoglycosides. It is worth noting that—at least in this 
sense—translation in chloroplasts, which have diverged 
from bacteria for about 1.2 billion years, remains sensitive to 
aminoglycosides. 
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19.1 Organelle Inheritance Transmits Genes 
Carried on Organelle Chromosomes 
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Mitochondria and chloroplasts possess their own genomes, 
each encoding a small number of genes. The products of 
these genomes function within the respective organelle. 
Because many copies of organellar DNA occur in each cell, 
multiple genotypes may coexist in a single cell. 

Cells or organisms in which all genomic copies of an organelle 
gene have an identical sequence are said to be homoplasmic 
for that gene, whereas cells or organisms possessing multiple 
alleles for an organelle gene are called heteroplasmic. 
Replication of organelle genomes and organelle division are 
not directly coupled with the nuclear cell cycle. 

Replicative segregation of organelles can result in homoplas- 
mic cells being derived from heteroplasmic cells. 

The proportion of mutant alleles in heteroplasmic cells in- 
fluences the penetrance and expressivity of phenotypes. 


19.2 Modes of Organelle Inheritance Depend 
on the Organism 


il 


The transmission genetics of organelle genomes is often 
determined by the relative amounts of cytoplasm contrib- 
uted by the parental gametes. 

Organelles are maternally inherited in mammals and many 
plant species, whereas in fungal species, mitochondria are 
often biparentally inherited. In some species, organelle 
inheritance is determined by alleles of a nuclear gene. 


19.3 Mitochondria Are the Energy Factories 
of Eukaryotic Cells 


| 


Mitochondria are the sites of energy production; the enzymes 
of oxidative phosphorylation are on the inner membrane. 


Mitochondrial mutations often have pleiotropic effects that 
reflect the role of mitochondria in energy production. 


KEYWORDS 


19.4 Chloroplasts Are the Sites of Photosynthesis 


Chloroplasts are the sites of photosynthesis, conducted by 
enzymatic reactions responsible for carbon fixation in the 
stroma and by photosystem complexes that convert light to 
chemical energy in the thylakoid membranes. 

Only a small fraction of the proteins present in a mitochon- 
drion or chloroplast are encoded in the genome of the re- 
spective organelle; instead, most of the proteins are encoded 
in the nuclear genome and post-translationally imported 
into the organelles. 


19.5 The Endosymbiosis Theory Explains 
Mitochondrial and Chloroplast Evolution 


Both the mitochondrion and the chloroplast are evolution- 
arily derived from ancient endosymbioses in which a bac- 
terium (a-proteobacteria and cyanobacteria, respectively) 
was incorporated into a eukaryotic cell. 

The circular structure (in most organisms) and transcrip- 
tional and translational expression of mitochondrial and 
chloroplast genomes reflect their evolutionary origins as 
bacterial endosymbionts of eukaryotic cells. 

Many of the genes present in the ancestral endosymbiont 
have been transferred to the nuclear genome of the host 

cell and have contributed extensively to eukaryotic nuclear 
genome content. 

The process of DNA transfer from organelle genomes to the 
nuclear genome is ongoing, and recent transfers of organelle 
DNA into the nucleus can be detected in most, if not all, 
organisms. 

Genes transferred from the ancient endosymbiont genome 
to the host nuclear genome encode proteins that may be 
targeted to any compartment of the eukaryotic cell. 
Eukaryotic informational genes are related to archeal 
genes, thus suggesting that eukaryotes might be descended 
from an archaea-like cell that acquired a bacterial 
endosymbiont. 
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PROBLEMS 


Chapter Concepts 


1. 


Reciprocal crosses of experimental animals or plants some- 
times give different results in the Fı. What are two possible 
genetic explanations? How would you distinguish between 
these two possibilities (i.e., what crosses would you per- 
form, and what would the results tell you)? 


How are some of the characteristics of the organelles (the 
mitochondria and chloroplasts) explained by their origin as 
ancient bacterial endosymbionts? 


The human mitochondrial genome encodes only 22 tRNAs, 
but at least 32 tRNAs are needed for cytoplasmic trans- 
lation. How are all codons in mitochondrial transcripts 
accommodated by only 22 tRNAs? The Plasmodium 
mitochondrial genome does not encode any tRNAs; how 
are genes of the Plasmodium mitochondrial genome 
translated? 


What is the evidence that transfer of DNA from the organ- 
elles to the nucleus continues to occur? 


Draw a graph depicting the relative amounts of nuclear 
DNA present in the different stages of the cell cycle 
(Gr S, Ga, M). On the same graph, plot the amount 

of mitochondrial DNA present at each stage of the 

cell cycle. 


What are the differences between the universal code 
and that found in the mitochondria of some species? 


Application and Integration 


12. 


13. 


14. 


You are a genetic counselor, and several members of the 
family whose pedigree for an inherited disorder is depicted 
in Genetic Analysis 19.2 consult with you about the prob- 
ability that their progeny may be afflicted. What advice 
would you give individuals III-1, II-2, IlI-4, II-6, III-8, 
and III-9? 


A mutation in Arabidopsis immutans results in the 
necrosis (death) of tissues in a mosaic configuration. 
Examination of the mitochondrial DNA detects deletions 
of various regions of the mitochondrial genome in the tis- 
sues that are necrotic. When immutans plants are crossed 
with wild-type plants, the F; are wild type, and the F, are 
wild type and immutans in a 3:1 ratio. Explain the inheri- 
tance of the immutans mutation and a possible origin of 
the mitochondrial DNA deletions. 


What type or types of inheritance are consistent with the 
following pedigree? 


( MasteringG enetics™" Visit for instructor-assigned tutorials and problems. 


For answers to selected even-numbered problems, see Appendix: Answers. 


10. 


11. 


Given that some changes (UGA = stop — Trp) 
have occurred multiple independent times in evolu- 
tion, can you think of any selective advantage to the 
mitochondrial code? 


What is the evidence that the ancient mitochondrial 
and chloroplast endosymbionts are related to the 
a-proteobacteria and cyanobacteria, respectively? 


Outline the steps required for a gene originally present in 
the endosymbiont genome to be transferred to the nuclear 
genome and be expressed, and for its product to be tar- 
geted back to the organelle of origin. 


Consider the phylogenetic tree presented in Figure 19.18. 
How were the origins of secondary endosymbiosis in the 
brown algae determined? 


Most large protein complexes in mitochondria and 
chloroplasts are composed both of proteins encoded 

in the organelle genome and proteins encoded in the 
nuclear genome. What complexities does this introduce 
for gene regulation (i.e., for ensuring that the appropri- 
ate relative numbers of the proteins in a complex are 
produced)? 


What insights have analyses of human mitochondrial DNA 
provided into our recent evolutionary past? 


For answers to selected even-numbered problems, see Appendix: Answers. 


15. 


You have isolated (1) a streptomycin-resistant mutant 
(str?) of Chlamydomonas that maps to the chloroplast 
genome and (2) a hygromycin-resistant mutant (hyg*) of 
Chlamydomonas that maps to the mitochondrial genome. 
What types of progeny do you expect from the following 
reciprocal crosses? 


mt* str? hyg’ X mt” str’ hyg® 
mt” str’ hyg? X mt” str? hyg $ 


16. 


17. 


18. 


19. 


20. 


You have isolated two petite mutants, pet1 and pet2, in 
Saccharomyces cerevisiae. When pet1 is mated with wild- 
type yeast, the haploid products following meiosis segre- 
gate 2:2 (wild type : petite). In contrast, when pet2 is mated 
with wild type, all haploid products following meiosis are 
wild type. To what class of petite mutations does each of 
these petite mutants belong? What types of progeny do you 
expect from a pet1 X pet2 mating? 


Consider this human pedigree for a vision defect. 


ll HAD 


1/23 4 5 


EO 
123 4 5 6 7 8 9 1011 12 13 14 
What is the most probable mode of inheritance of the dis- 
ease? Identify any discrepancies between the pedigree and 
your proposed mode of transmission, and provide possible 


explanations for these exceptions. 


A 50-year-old man has been diagnosed with MELAS syn- 
drome (see Figure 19.7). His wife is phenotypically normal, 
and there is no history of MELAS syndrome in either of 
their families. The couple is concerned about whether their 
children will develop the disease. As a genetic counselor, 
what will you tell them? Would your answer change if it 
were the mother who exhibited disease symptoms rather 
than the father? 


The first person in a family to exhibit Leber hereditary 
optic neuropathy (LHON) was II-3 in the pedigree shown 
below, and all of her children also exhibited the disease. 
Provide two explanations as to why II-3’s mother (I-1) did 
not exhibit symptoms of LHON. 


Il A j 
n oun 


The following pedigree shows a family in which several 
individuals exhibit symptoms of the mitochondrial dis- 
ease MERRF. Two siblings (II-2 and II-5) approach you to 
inquire about whether their children will also be afflicted 
with MERRF. What do you tell them? 


O 


3——4 


ery yy ye 


1 2 


21. 


22. 


23. 


24. 


25. 


26. 
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A 9-bp deletion in the mitochondrial genome between the 
gene for cytochrome oxidase subunit II and the gene for 
tRNA is a common polymorphism among Polynesians 
and also in a population of Taiwanese natives. The fre- 
quency of the polymorphism varies between populations: 
the highest frequency is seen in the Maoris of New Zealand 
(98%), lower levels are seen in eastern Polynesia (80%) and 
western Polynesia (89%), and the lowest level is seen in the 
Taiwanese population. What do these frequencies tell us 
about the settlement of the Pacific by the ancestors of the 
present-day Polynesians? 


What is the most likely mode of inheritance for the trait 
depicted in the following human pedigree? 


O 


LEA RE 


In 1918, the Russian Tsar Nicholas II was deposed, and he and 
his family were reportedly executed and buried in a shallow 
grave. During this chaotic time, rumors abounded that the 
youngest daughter, Anastasia, had escaped. In 1920, a woman 
in Germany claimed to be Anastasia. In 1979, remains were 
recovered for the tsar, his wife (the Tsarina Alexandra), and 
three of their children, but not Anastasia. How would you 
evaluate the claim of the woman in Germany? 


The dodo bird (Raphus cucullatus) lived on the Mauritius 
Islands until the arrival of European sailors, who quickly 
hunted the large, placid, flightless bird to extinction. Rapid 
morphological evolution such as often accompanies island 
isolation had caused the bird’s huge size and obscured 

its physical resemblance to any near relatives. However, 
sequencing of mitochondrial DNA from dodo bones reveals 
that they were pigeons, closely related to the Nicobar pigeon 
from other islands in the Indian Ocean. Why was mitochon- 
drial DNA suited to the study of this extinct species? 


Cytoplasmic male sterility (CMS) in plants has been ex- 
ploited to produce hybrid seeds (see Experimental Insight 
19.1). Specific CMS alleles in the mitochondrial genome 
can be suppressed by specific dominant alleles in the 
nuclear genome, called Restorer of fertility alleles, RF. 
Consider the following cross: 


Q CMS1 Rf1/Rf1 rf 2/rf2 X S CMS2 rf 1/rf 1 Rf2/Rf2 


What genotypes and phenotypes do you expect in the F}? If 
some of the F; plants are male fertile, what genotypes and 
phenotypes do you expect in the F3? 


Wolves and coyotes can interbreed in captivity; and now, 
because of changes in their habitat distribution, they may 
have the opportunity to interbreed in the wild. To examine 
this possibility, mitochondrial DNA from wolf and coyote 
populations throughout North America—including habi- 
tats where the two species both reside—was analyzed, and 
a phylogenetic tree was constructed from the resulting data 
(see Section 1.4 for details on how this is accomplished). 
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Sequence from a jackal was used as an outgroup and a se- 
quence from a domestic dog was included, demonstrating 
wolves as the origin of domestic dogs. 


Coyote 1 


Coyote 5 


Coyote 6 


Dog 


Jackal 


20: 


28. 


What do you conclude about potential interspecific hy- 
bridization between wolves and coyotes on the basis of this 
phylogenetic tree? 


Considering the phylogenetic assignment of Plasmodium 
falciparum, the malarial parasite, to the phylum 
Apicomplexa (see Figure 19.18), what might you speculate 
as to whether the parasite is susceptible to aminoglycoside 
antibiotics? 


Elysia chlorotica is a sea slug that acquires chloroplasts by 
consuming an algal food source, Vaucheria litorea. The 
ingested chloroplasts are sequestered in the sea slug’s di- 
gestive epithelium, where they actively photosynthesize for 
months after ingestion. In the algae, chloroplast metabo- 
lism depends on the algal nuclear genome for over 90% of 
the required proteins. Thus it is suspected that the sea slug 
actively maintains ingested chloroplasts, supplying them 
with photosynthetic proteins encoded in the sea slug ge- 
nome. How would you determine whether the sea slug has 
acquired photosynthetic genes by horizontal gene transfer 
from its algal food source? 


Developmental Genetics 


Multicellularity has evolved multiple times within the eukaryotes, as exem- 
plified by Volvox, a chlorophyte green alga and member of a multicellular 
lineage independent of land plants and animals. In Volvox, the outer cells 
are somatic while the germ cells will be derived from the inner cells. 


| he development of a multicellular organism from 

a single fertilized egg cell is one of the wonders of 
evolution. The fertilized egg undergoes an initial mitotic 
division to produce two genetically identical daughter 
cells. Those two cells divide to produce four identical cells, 
which divide to produce eight cells, and so on. Yet, while all 
cells in the growing embryo continue to carry the same ge- 
netic information, many of them acquire different identities 
as the embryo develops different body parts, organs, and 
tissues. This development is a genetically programmed 
process, occurring in the same way in all members of a species. 


CHAPTER OUTLINE 


20.1 Development Is the Building 
of a Multicellular Organism 


Drosophila Development 

Is a Paradigm for Animal 
Development 

Cellular Interactions Specify Cell 
Fate 

“Evolution Behaves Like a 
Tinkerer” 

Plants Represent an 
Independent Experiment in 
Multicellular Evolution 


ESSENTIAL IDEAS 


Genes encoding transcription factors or signaling 

molecules direct the formation of specialized cell 

types. 

Drosophila embryos are subdivided into segments 
with unique identities by the sequential action of 
batteries of transcription factors. 


Hox genes specify the identity of body segments 
of Drosophila and are largely conserved 
throughout metazoans. 


Cells signal to either induce or inhibit 
neighboring cells from adopting particular 
developmental pathways. 


Morphological evolution can be the result of 
changes in gene expression patterns of a common 
genetic toolkit. 


Plant developmental genetics shares similarities 
with that of animals despite multicellularity 
evolving independently. 
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Different species exhibit both similarities and 
differences in development, the former because of 
shared evolutionary ancestry and the latter because 
of species-specific adaptations. 

Geneticists rely on defects in development to 
reveal the mechanisms of normal development. As 
early as 1790, the German scientist and philosopher 
Johann Wolfgang von Goethe recognized the 
potential of this approach: 


From our acquaintance with...abnormal 
metamorphosis, we are enabled to unveil the 
secrets that normal metamorphosis conceals 
from us, and to see distinctly what, from the reg- 
ular course of development, we can only infer. 


Even so, the connections between developmental 
abnormalities, gene mutations, and the mechanisms 
that control normal development could not be 
understood in any detail until scientists began to 
apply the basic principles of genetics to the study 

of development. This process began around 1900, 
when the young embryologist Thomas Hunt Morgan 
decided to shift his research to focus on the nascent 
field of genetics, using the fruit fly Drosophila as 

his experimental organism. While Morgan never 
returned to the study of embryology, his students 
and his students’ students blazed new trails by 
exploiting Drosophila genetics to illuminate many 
of the secrets of development in all metazoans 
(multicellular animals) and in plants as well. 

In this chapter, we discuss the genetic processes 
that control development in complex multicellular 
organisms and the experimental approaches that led 
to their discovery. 


20.1 Development Is the Building 
of a Multicellular Organism 


An animal begins its life as a single cell, the zygote, from 
which all the cell types, each characterized by a specific 
gene expression pattern, of the adult animal ultimately are 
derived. The key to understanding the molecular genetic 
basis of development is to understand how different pat- 
terns of gene expression are established and maintained 
as cells differentiate and specialize. 


In 1915, Calvin Bridges (a student of Thomas Hunt 
Morgan) identified a Drosophila mutation in which the 
small hind wings, the halteres, developed into structures 
resembling the forewings (Figure 20.1a). Mutations in 
which an apparently normal organ or body part develops 
in the wrong place are called homeotic mutations (from 
the Greek homeos, meaning “the same” or “similar”), and 
they have been central to the progress geneticists have 
made in understanding how complex organisms develop 
and evolve. Ed Lewis (a student of Morgan’s student 
Alfred Sturtevant) later identified the bithorax complex 
of genes as being responsible for the homeotic mutation 
observed by Bridges. As we discuss in this chapter, 
mutations in bithorax genes change the developmental 
program of a portion of the fruit-fly body, resulting in the 
transformation of the halteres into a second set of fore- 
wings. Another example is the dominant Antennapedia 
mutation, in which relatively normal fly legs develop in 
the positions that should be occupied by the antennae 
(Figure 20.1b). To understand the cascades of events 
responsible for such developments, we must first exam- 
ine the phenomenon of cell differentiation and pattern 
formation. 


(a) In a bithorax mutation, halteres seen in wild-type Drosophila 
(left) develop instead into a second set of wings (right). 


Halteres A second set of wings in the 
position normally occupied by 


halteres 


(b) In an Antennapedia mutation, antennae in wild-type Drosophila 
(left) develop instead into legs (right). 


Antenna 


Appendages that normally 
develop into antennae, 
develop into legs. 


Figure 20.1 Inappropriate positions of organs and body 
structures in homeotic mutants. 
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Cell Differentiation 


In an animal, fertilization of an haploid egg cell by a hap- 
loid sperm cell forms a single-celled diploid zygote, which 
undergoes several mitotic divisions to form a small cluster 
of embryonic cells that are genetically identical. These 
embryonic cells are totipotent, which means they have 
the potential to differentiate into any tissue or cell type 
the animal can produce. In vertebrates, totipotent cells 
of early embryos are called embryonic stem cells. In to- 
tipotent cells, all genes have the potential to be expressed 
given the appropriate cues. As development proceeds, 
however, cells become differentiated, taking on differ- 
ent morphologies and undertaking different physiological 
activities. 

Differentiation is characterized by changes in pat- 
terns of gene expression that progressively limit which 
genes continue to be expressed by each cell type. At a cer- 
tain stage in development, cells retain the potential to give 
rise to many different types of descendants, but not to all 
types—at this stage, the cells are said to be pluripotent. 
As development progresses further, however, most cells 
ultimately become specialized: These fully differentiated 
and specialized cells express only a subset of genes in the 
genome, and each cell type has its own characteristic pat- 
tern of gene expression. Thus development is a progres- 
sive process during which totipotent cells differentiate 
into specialized cell types through a series of genetically 
controlled steps that place ever more restrictive limits on 
their developmental potential. 

While most cells of adult animals are fully differentiated 
and locked into a specific cell fate, there are some excep- 
tions. In our bodies, various types of pluripotent stem cells— 
such as muscle, epidermal, epithelial, and hematopoietic 
(blood) cells—retain the capacity to develop into a range of 
further-specialized cells to replenish cells that are lost. 


Pattern Formation 


How do genetically identical cells acquire different fates? 
Two mechanisms have been identified: Cells can inherit 
some definitive molecule that specifies cell fate, or the fate 
of cells can be determined by their interaction with neigh- 
boring cells through the action of signaling molecules. 
Inheritance of a fate-determining molecule depends on 
the identity of progenitor cells, whereas development 
through the influence of neighboring cells depends on the 
identity of those neighbors. 

The term pattern formation describes the intricately 
interacting events that organize differentiating cells in the 
developing embryo to establish the three body-plan axes of 
the mature organism: anterior—posterior, dorsal—ventral, and 
left-right (Figure 20.2). Cells have various ways of “knowing” 
their locations with regard to these axes. The combination of 
internal and external signals that a cell perceives during de- 
velopment provides information on the cell’s location within 
an organism and its appropriate course of differentiation. 


Dorsal 


Anterior Posterior 


Ventral 


Figure 20.2 The three embryonic axes of a zebrafish. 


To understand the role that the positional informa- 
tion represented by these signals plays in development, 
consider the French flag, which has a simple pattern of 
three vertical stripes in the order blue, white, and red, 
along a single (anterior—posterior) axis (Figure 20.3a). 
While French flags may come in various sizes, the pro- 
portions of the stripes within each flag remain generally 
constant, dividing the flag into thirds. Imagine the entire 
flag to consist of cells descended from a single parent 
cell. How do daughter cells know whether they are to dif- 
ferentiate as blue, white, or red? The cells could interpret 
their position by one or more of various mechanisms, but 
the simplest to envision is based on the concentration 
gradient of a molecule that is highly concentrated at one 
end of the embryonic flag and much less concentrated at 
the opposite end. The position of each cell on the flag’s 
anterior—posterior axis is defined by the concentration of 
this molecule, in which threshold values define boundar- 
ies between discrete fates: Above a certain concentration, 
the result is blue cell identity; below this threshold con- 
centration, white cells develop; and below an even lower 
threshold, red cells develop. Substances whose presence 
in different concentrations directs developmental fates 
are referred to as morphogens. If activation or repression 
of gene expression is dependent upon threshold con- 
centrations of a morphogen (e.g., concentrations above 
which a gene is active and below which a gene is inactive), 
discrete boundaries of gene expression can be established. 

Once a cell has acquired a specific identity, it may 
induce its neighbors to acquire a certain fate; this process 
is termed induction. A classic case of induction was first 
noted more than a century ago, when transplantation 
of cells from one region of a developing frog embryo to 
another region of a second embryo induced the surround- 
ing cells to form a second body axis (Figure 20.3b). The 
region from which the transplanted cells were derived 
was called the organizer because the cells of that region 
possess the ability to organize cells in the surrounding 
tissue. Alternatively, a cell that acquires a specific fate 
may produce an inhibitory substance that prevents its 
neighbors from acquiring a certain fate, and this process 
is called inhibition (Figure 20.3c). Inhibition can be used 
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(a) Positional information (b) Induction 


| L L 
All cells have the potential to differentiate as 


(c) Inhibition 


red, white, or blue. The differentiation of each 
cell is determined by the concentration of a 
morphogen along a gradient. 


One cell (green) produces a molecule 
that causes neighboring cells to 
differentiate with a particular fate (blue). 


prevents neighboring cells from 


One cell (red) produces an inhibitor that 
differentiating with a particular fate. 


Blue | | | 
cells 


Concentration 
of morphogen 


Example: Moving the organizer cells from 
one frog embryo to another induces the 
development of a second body axis. 


Example: Drosophila cells expressing 
achaete (brown) become ectoderm and 
inhibit neighboring cells from doing the 


Figure 20.3 Mechanisms of differentiation. 


to produce patterns of regularly spaced cells of a par- 
ticular fate within a field of cells that would otherwise all 
differentiate in the same manner, such as in the example 
of Drosophila shown in Figure 20.3c. Other examples 
of tissues with regular spacing include many epidermal 
features, such as bristles, feathers, hairs, and scales. 

The developmental histories of cells can affect 
how the cells respond to cues from their neighbors. For 
example, for a cell to be able to respond to an induc- 
tive or inhibitory signal from neighboring cells, it must 
express the appropriate receptor. In addition, cells able 
to respond to a signal may behave differently depending 
on what other factors are present in the cell. When a cell 
divides, the daughter cells usually inherit the same set of 
transcription factors and chromatin states that existed 
in the cell they were derived from (the importance of 
chromatin states is discussed in Section 20.2). However, 
occasional asymmetric cell divisions in which the two 
daughter cells inherit different cellular constituents and 
acquire different fates underlie developmental patterning 
events in some species. 

Positional information, induction, inhibition, and 
asymmetric cell divisions are common processes directing 
cell differentiation and pattern formation in multicellular 
organisms. When employed sequentially and reiteratively 
during embryogenesis, these processes enable a single- 
celled zygote to develop into a complex organism having 


same, 


a multitude of cell types. Each cell division in the embryo 
brings about changes in the relative positional relation- 
ships between the cells, so new opportunities for cell- 
cell communication are constantly created. In keeping 
with the importance of positional information, induction, 
and inhibition in development, most genes identified 
as having prominent roles in developmental processes 
encode proteins that act as either transcription factors or 
signaling molecules. 


20.2 Drosophila Development Is a 
Paradigm for Animal Development 


Discoveries about the developmental processes of Drosophila 
have made it ontogenetically one of the best-understood 
animals on the planet. These insights have in turn profoundly 
influenced how geneticists perceive the development and 
evolution of all other animals, ourselves included. For their 
work in unraveling some of the mechanisms underlying pat- 
tern formation in Drosophila, Edward B. Lewis, Christiane 
Niisslein-Volhard, and Eric Wieschaus were awarded the 
Nobel Prize in Physiology or Medicine in 1995. 

One of the reasons that Drosophila is an ideal genetic 
experimental organism is its short, 9-day life cycle 
(Figure 20.4a). Embryogenesis spans the first 24 hours of 
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(a) Drosophila life cycle 


(b) Embryogenesis 
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Minutes \ 
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instar 


Figure 20.4 Overview of Drosophila development. 


Drosophila development, commencing with the deposition 
of a fertilized egg that immediately begins a rapid series 
of genetically controlled changes (Figure 20.4b). After 
embryogenesis, development progresses through three 
distinct larval stages, called instars. Each instar stage is 
marked by progressive development of tissues and struc- 
tures that will form the adult fly. Following the third instar 
stage, the larva forms a pupa in which metamorphosis will 
take place. At the conclusion of pupation a fully formed 
adult fruit fly emerges, ready to begin the cycle anew. 

The Drosophila egg has conspicuous anterior— 
posterior and dorsal—ventral polarities that are acquired 
during its production in the female fly. In contrast 
to early development in many other species, early 
embryonic development in Drosophila proceeds by nu- 
clear division without division of cytoplasm. Rather than 
forming blastomeres, as in mammalian development, 
this process forms a syncytium, a multinucleate cell in 
which the nuclei are not separated by cell membranes 
(see Figure 20.4b). The fertilized egg undergoes nine 
mitotic nuclear divisions, after which the nuclei migrate 
to the periphery of the embryo. At this time, about 10 
pole cells, from which the germ line will be derived, are 
set aside at the posterior end of the embryo. The so- 
matic cells undergo another four rounds of mitotic divi- 
sions at the periphery, forming a syncytial blastoderm 


Posterior 


Nuclear divisions and migration 


(c) Segmentation pattern 


Denticles 


ENAS 


T1 T2 T3 A1 A2 A3 A4 A5 A6 A7 A8 


Head Thorax Abdomen 


Maxillary 
Labial T'T2 


containing about 6000 nuclei. By about 3 hours after egg 
laying, cellularization of the syncytium occurs by the as- 
sembly of cell membranes that separate nuclei into indi- 
vidual cells, thus forming a cellular blastoderm. 

During the syncytial blastoderm and cellularization 
stages, cells become progressively restricted in their 
developmental potential. This can be demonstrated 
experimentally by transplanting cellular blastoderm 
cells from one embryo into another. Blastoderm cells 
implanted into an equivalent region of a host embryo 
are incorporated normally into host structures, but 
those transplanted into different regions will develop 
autonomously into tissues reflecting the original position 
of the cells in the donor embryo. Thus, at the cellular 
blastoderm stage, cells have already become committed to 
differentiate into particular tissues. 

Drosophila is typical of insects in the segmentation 
pattern of its adult body. Eight abdominal and three 
thoracic segments are easily distinguished (Figure 20.4c). 
The head consists of at least three distinct developmental 
segments. The segments of the insect body are first vis- 
ible during embryogenesis, where they are indicated by 
the pattern of denticles (small hooks for gripping during 
larval movement) on the ventral epidermis. The body plan 
established during embryogenesis determines the organi- 
zation of tissues and organs in the adult fly. 
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The Developmental Toolkit of Drosophila 


Large-scale genetic screens (see Section 16.1) were com- 
menced by Christiane Niisslein-Volhard, Eric Wieschaus, 
and others in the late 1970s and early 1980s to identify and 
describe the function of genes directing pattern formation 
in Drosophila embryos. It is estimated that mutations 
in about 5000 of the 14,000 genes in Drosophila will 
result in a lethal phenotype. Most mutations resulting in 
lethality affect genes that have essential cellular functions, 
and these genes are sometimes described as housekeep- 
ing genes. However, several hundred genes producing 
lethal phenotypes are involved directly in developmental 
programs of pattern formation during embryogenesis. 
Niisslein-Volhard and Wieschaus faced a significant 
challenge when designing genetic screens for mutations 
in pattern formation because flies in which segmental 


(a) Coordinate gene 


Highest concentration 
of bicoid 


(b) Gap gene 


knirps 


hunchback 


pattern formation is severely disrupted rarely survive 
beyond the larval stage. Their solution was to focus 
on embryos and larvae. They reasoned that mutations 
affecting embryonic pattern formation would not be 
lethal until larval formation, leaving a short window of 
time for observation of the effects of such mutations. 
From the types of spatial defect exhibited by the mutant 
phenotypes, mutants were grouped into four gene classes, 
with a fifth class identified earlier by Ed Lewis: 


1. Coordinate genes: Defects affect an entire pole of 
the larva (Figure 20.5a). 

2. Gap genes: Mutants are missing large, contiguous 
groups of segments (Figure 20.5b). 

3. Pair-rule genes: Mutants are missing parts of adjacent 
segment pairs, in two alternating patterns (Figure 20.5c). 


(c) Pair-rule gene (d) Segment polarity gene 
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Figure 20.5 Mutations causing defects in pattern formation in Drosophila. A fifth class of 
mutations, homeotic gene mutations, is represented in Figure 20.10. 
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4, Segment polarity genes: Defects affect patterning 
within each of the 14 segments (Figure 20.5d). 


5. Homeotic genes: Defects affect the identity of one or 
more segments. 


These five gene classes are expressed sequentially dur- 
ing embryogenesis: The coordinate genes act first, followed 
by gap genes, pair-rule genes, segment polarity genes, and 
finally homeotic genes. The cascade of gene expression 
subdivides the embryo in successive steps, first into broad 
regions and then into progressively smaller domains, and 
each of the 14 resulting segments acquires a specific iden- 
tity. The patterns of mRNA and protein expression of each 
gene correspond, both in space and in time, to its mutant 
phenotype (see Figure 20.5). For example, expression of 
the gap gene knirps spans a contiguous embryonic domain 
that is destined to become abdominal segments. These 
abdominal segments are missing in knirps mutants, as is 
evident in the early larva (see Figure 20.5b). 

Expression of the pair-rule genes follows that of gap 
genes and produces 14 stripes in the embryo. Curiously, 
the stripes of gene expression of pair-rule genes do not 
correspond to the segments of the adult insect, but rather 
straddle the boundaries between segments, thus occupy- 
ing the posterior part of one segment and the anterior 
part of its neighbor. The domains of gene expression con- 
trolled by the pair-rule genes are therefore called paraseg- 
ments. Expression of the segment polarity genes occurs in 
14 polar stripes (i.e., each stripe has anterior and posterior 
“poles”), one for each segment of the embryo. The homeo- 
tic genes are the last to be expressed and affect broad 
domains of contiguous parasegments along the anterior— 
posterior axis. The anterior expression boundaries of the 
homeotic genes correspond to parasegment boundaries 
defined by the pair-rule genes. Thus, the sequential activa- 
tion of different classes of genes during early development 
is reflected in the sequential subdivision of the organism, 
from a single-celled zygote into a segmented embryo. 

When the expression pattern of a gene in a wild-type 
embryo corresponds precisely to the cell fates that are dis- 
rupted when the gene is mutated, the activity of the gene 
is said to be cell autonomous. A gene whose action is cell 
autonomous affects only the cells in which the gene is tran- 
scribed and expressed. Four of the five classes of genes act 
largely cell autonomously, an observation consistent with 
the identity of these genes as transcription factors. The ex- 
ception is the segment polarity class of genes, which often 
encode signaling molecules that can act non-autonomously, 
that is in cells other than where the gene is expressed. In the 
following sections, we examine how the embryo is succes- 
sively subdivided by the activity of these sets of genes. 


Maternal Effects on Pattern Formation 


In animals, the mother often supplies critical gene products 
to the egg that subsequently direct embryo development. 
These genes are called maternal effect genes. Note that 


maternal effects are different from maternal inheritance 
(introduced in Chapter 19), in that maternal effects entail 
the maternal deposition of protein or mRNA in the egg 
cell, whereas maternal inheritance refers to maternal trans- 
mission of genetic material (e.g., organelle genomes). 

How can the maternal effect genes that influence 
development be identified in mutant screens, given that for 
these genes, the embryonic phenotype is determined by the 
genotype of the mother rather than that of the embryo? An 
answer becomes apparent when we compare the inheri- 
tance patterns observed with maternal effect genes against 
those observed with zygotic genes, genes that are active 
only in the zygote or embryo. For zygotic genes, the geno- 
type of the embryo determines the phenotype. The following 
cross illustrates this principle for an autosomal recessive 
mutation (m). 


Inheritance Pattern with Zygotic Genes and Inheritance 
Pattern with Maternal Effect Genes 


Zygotic Genes 


Parents Offspring Phenotype 
m/+X m/+ m/+, +/+ Normal (3) 
m/m Mutant (1) 


With maternal effect genes, where the genotype of the 
mother determines the phenotype of the zygote, the same 
cross as above, involving an autosomal recessive mutation 
(m), would give the following outcomes: 


Maternal Effect Genes 


Parents (female X male) Offspring Phenotype 

m/+ xX m/ m/m, m/+, +/+ All normal 
m/+xX m/m m/m, m/+ All normal 
m/m X +/+ or m/+ or m/m m/m, m/+ All mutant 


These divergent patterns allow discrimination be- 
tween maternal effect genes and zygotic genes. Crosses 
can be performed to determine whether the genes are 
active maternally, zygotically, or both. When such crosses 
were performed to test the five classes of mutants de- 
scribed above, the coordinate genes were found to be 
maternally active; their expression in the mother rather 
than in the embryo provides positional information to the 
egg. Most gap genes are active zygotically, but at least one, 
hunchback, also exhibits maternal activity. All pair-rule, 
segment polarity, and homeotic genes act strictly zygoti- 
cally. These findings make sense given the developmental 
stage at which the different classes of gene are active and 
the observation that zygotic gene expression commences 
only in the syncytial blastoderm stage of embryogenesis. 


Coordinate Gene Patterning of the 
Anterior-Posterior Axis 
The genetic control of development is essentially a pro- 


cess of regulating gene expression in three-dimensional 
space over time. It is not surprising, then, that most of the 
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early-acting genes establishing the anterior—posterior axis 
of Drosophila encode transcription factors. The interac- 
tion of transcription factors with cis-acting regulatory ele- 
ments of target genes provides spatial control of gene ex- 
pression. This spatial control is coordinated over time by 
continual inputs from neighboring cells. In this section, we 
describe examples of the spatial and temporal regulation of 
gene expression that results in subdivision of a developing 
Drosophila embryo into its characteristic segments. 

The coordinate gene bicoid plays a major role in the 
establishment of the anterior—posterior axis in Drosophila. 
Loss-of-function bicoid alleles result in a loss of anterior 
portions of the embryo; the anterior portions are replaced 
instead by a mirror-image duplication of posterior re- 
gions (Figure 20.6a). Bicoid mRNA is anchored to the an- 
terior region of the egg during oogenesis in the mother 
(Figure 20.6b). After translation, the resulting protein 
(Bicoid) diffuses from its site of synthesis at the anterior pole 
of the embryo throughout the syncytial embryo, owing to the 
absence of cell membranes to impede protein diffusion. The 
diffusion results in a gradient of Bicoid in which the highest 
concentration is at the anterior end and very little Bicoid is 
detected beyond the middle of the embryo. 


(a) (b) 


Anterior Posterior Anterior Posterior 
Wild-type embryo bicoid mRNA 
(blue) Translation, 
diffusion 


Loss of bicoid activity results in 
loss of anterior segments and 
duplication of posterior 
abdominal segments (A7, A8, 
anal plate [ap)). 


Bicoid protein 
(brown) 


Injecting bicoid mRNA into an 
ectopic position (red) of a bicoid 
embryo results in a mirror-image 
duplication of anterior thoracic 
segments (T1) flanking the site 
of injection. 


Figure 20.6 Maternal bicoid patterning of the embryo 
along the anterior—posterior axis. 


Cytoplasmic transplantation experiments elegantly 
demonstrate that Bicoid specifies anterior identity. 
Anterior cytoplasm extracted from a wild-type embryo 
and then injected into a bicoid mutant embryo causes 
anterior structures to develop at the site of injection (see 
Figure 20.6a). When the bicoid gene was cloned, similar 
experiments were carried out with purified bicoid mRNA, 
which produced the same result. These findings indicate 
that the concentration gradient of Bicoid provides posi- 
tional information along the anterior—posterior axis of the 
embryo, presumably by differentially regulating several 
genes that respond to different concentrations of Bicoid. 
Among the known zygotic genes whose transcription is 
directly regulated by Bicoid is the gap gene hunchback. 

Surprisingly, examination of the distribution 
of hunchback mRNA revealed that hunchback is also 
maternally expressed and that its maternal expression is 
uniform throughout the egg (Figure 20.7a). The hunch- 
back protein (Hunchback), on the other hand, is found 
only at the anterior end of the early embryo, implying 
that posterior hunchback mRNA is not translated. This 
seeming contradiction was explained by the discovery of 
another maternally expressed coordinate gene, nanos. The 
posterior end of the embryo is patterned by nanos, whose 
protein forms a gradient with the highest concentration 
at the posterior end. Rather than encoding a transcription 
factor, nanos encodes a protein that represses translation 
of hunchback mRNA. Thus, Hunchback is restricted to 
the anterior end of the embryo by posterior translational 
repression of maternal hunchback mRNA. In addition, 
zygotic hunchback expression in the anterior end is tran- 
scriptionally activated by anteriorly localized Bicoid. 

Patterning of the posterior end of the embryo is 
governed by similar interactions. In addition to acting as a 
transcription factor, Bicoid acts as a translational repres- 
sor of the maternally supplied caudal mRNA, which is 
uniformly distributed throughout the egg. Translational 
repression of caudal mRNA by the anterior gradient of 
Bicoid results in a posterior gradient of caudal protein 
(Caudal). The end result is an embryo with graded 
distributions of three transcription factors: Bicoid and 
Hunchback, in which the highest concentration is at the 
anterior end; and Caudal, in which the highest concentra- 
tion is at the posterior end. The relative concentrations of 
these three proteins provide positional information along 
the length of the embryo, which is interpreted by the sub- 
sequently acting gap genes. 


Domains of Gap Gene Expression 


The broad gradients of maternally supplied coordinate 
gene products are transformed into domains of gap gene 
expression with discrete boundaries. This occurs through 
a combination of cooperative binding of transcription 
factors—similar to the activation of the lambda repres- 
sor described in Chapter 14—and _ cross-regulatory 
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Figure 20.7 Gap gene expression patterns are activated by coordinate genes. 


interactions among the gap genes themselves. To begin, 
let’s consider further how the gradual concentration gradi- 
ent of Bicoid is translated into the more discrete pattern of 
hunchback mRNA expression. 

As noted earlier, zygotic expression of the gap gene 
hunchback is confined to the anterior region of the em- 
bryo. Unlike Bicoid, which exhibits a gradual concentra- 
tion gradient, the concentration of hunchback mRNA 
declines precipitously at a particular point along the 
anterior—posterior axis. Transcription of hunchback is ac- 
tivated by the binding of Bicoid to cis-regulatory elements 
5’ to the hunchback coding region (Figure 20.7b). In this 
location, there are multiple cis-acting sites to which Bicoid 
can bind, and these sites are bound in a cooperative man- 
ner, meaning that the binding of one Bicoid molecule to 
one site facilitates the binding of a second Bicoid molecule 
to a second nearby site, and so on. Mutation of the Bicoid 
binding sites alters the responsiveness of hunchback ex- 
pression to Bicoid, and removal of all binding sites abol- 
ishes hunchback expression in the embryo (Figure 20.7c). 

A threshold level of Bicoid must be present in order 
for hunchback expression to be activated. Consequently, 
hunchback expression occurs on one side of a threshold 
concentration with no expression on the other, and a 
sharp boundary is produced. In this manner, the gradual 
anterior concentration gradient of Bicoid is translated 
into a distinct anterior region of hunchback mRNA ex- 
pression, which, after translation, produces a sharp gradi- 
ent of Hunchback (see Figure 20.7a). 


The gradient of hunchback protein is critical 
for the regulation of other gap genes, such as Kriippel 
(Figure 20.8), which is repressed by high levels of 
Hunchback but activated in the central region of the em- 
bryo where Bicoid levels are moderate. These interactions 
establish the anterior margin of Kriippel expression toward 
the posterior end of the Hunchback protein gradient. The 
posterior margin of Kriippel expression appears to be de- 
termined through negative regulation by other gap genes, 
knirps and giant. Similar regulatory interactions between 
other gap genes help establish the rest of the partially over- 
lapping patterns of gap gene expression that subdivide the 
developing embryo into discrete domains. 


Regulation of Pair-Rule Genes 


From the domains of gap gene expression emerge 14 
narrower stripes of gene expression that represent the first 
manifestation of segmentation of the anterior—posterior 
body plan. Analysis of the regulation of the pair-rule gene 
even-skipped (eve) revealed that each stripe is established 
by independent enhancer modules of cis-acting regulatory 
sequences of eve. Each enhancer module from a pair-rule 
gene responds to specific combinations of gap genes 
(Figure 20.9a). Thus, the formation of stripes of gene expres- 
sion is the result of combinatorial control of gene expression 
through multiple cis-acting regulatory elements of the pair- 
rule genes. This situation is conceptually similar to the regu- 
lation of the gap genes, as described earlier for hunchback. 


Hunchback protein 


hunchback § 


Anterior Posterior 
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Krüppel is repressed l 
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l 
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Krüppel is repressed 
by knirps. 


Kriippel | 


knirps 


Krüppel is repressed 
by giant. 


giant 
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Figure 20.8 Cross-regulatory interactions among gap 
genes define their expression patterns. 


Stripe 2 of eve provides an example of modularity 
in gene regulation. Gene expression within stripe 2 is 
controlled by a cis-regulatory element—the stripe 2 
enhancer module—located about 1700 bp to 1000 bp up- 
stream of the transcription initiation site of eve (see Figure 
20.9a). When this regulatory element is isolated and used 
to drive a reporter gene (see Section 16.4) in transgenic 
Drosophila embryos, expression is observed only in stripe 
2, indicating that these regulatory sequences are suffi- 
cient for stripe 2 expression. Detailed sequence analysis 
of this module identified binding sites for the gap pro- 
teins Hunchback, Kriippel, and Giant, as well as binding 
sites for Bicoid. Mutational analysis of different combina- 
tions of binding sites demonstrates that both Hunchback 
and Bicoid act as activators of even-skipped stripe 2 gene 
expression, while both Giant and Kriippel act as repressors. 

Stripe 2 lies entirely within the hunchback expression 
domain of the embryo and is flanked on the anterior 
side by the giant expression domain and on the posterior 
side by the Kriippel expression domain (Figure 20.9b). It 
contains an intermediate level of Bicoid remaining from 
the maternally established gradient. Thus the position of 
eve stripe 2 along the anterior—posterior axis is a zone with 
a high concentration of Hunchback, low concentrations 
of Giant and Kriippel, and an intermediate concentration 
of Bicoid. Only in parasegment 3, which is the location of 
stripe 2, are both positive regulators present and both neg- 
ative regulators absent (Figure 20.9c). This combination of 
gap and coordinate protein concentrations does not occur 
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Figure 20.9 Stripes of gene expression, established by 
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anywhere else along the axis of the embryo and uniquely 
defines the eve stripe 2 position. The integration of positive 
and negative regulators results in the precise limiting of 
even-skipped stripe 2 to a region only a few cells in width 
along the anterior—posterior axis. Similar combinatorial 
mechanisms are thought to control the expression patterns 
of all of the pair-rule and segment polarity genes. 

The discovery that in multicellular organisms the con- 
trol of gene expression is modular provided important insight 
into the evolution of organisms. Modularity of gene regula- 
tion allows changes in specific domains of expression with- 
out catastrophic disruption of global expression patterns. 


Specification of Parasegments by Hox Genes 


Having explored the mechanisms by which gap and pair-rule 
genes subdivide the Drosophila embryo into 14 segments, we 
can now consider how each segment acquires a unique iden- 
tity through the action of the homeotic genes. Once again, 
the key discoveries were made through the study of muta- 
tions, pioneered by Edward B. Lewis starting in the 1950s. 
As we saw at the beginning of the chapter, a remarkable 
aspect of homeotic mutant phenotypes is the development 
of relatively normal structures in inappropriate positions. 


(a) Adult body segments 


Another general feature of homeotic mutations is that they 
cause identity transformations of serially repeated structures. 
Legs, for example, are appendages that are normally limited 
to the three thoracic segments in Drosophila, whereas an- 
tennae are appendages that normally develop only on the 
third cephalic (head) segment. In the case of Antennapedia 
mutants, however, a leg appears in a segment ordinarily 
reserved for an antenna (see Figure 20.1), suggesting that 
Antennapedia normally specifies the identity of one or more 
of the thoracic segments. Analyses of homeotic genes in 
Drosophila demonstrate that in fact they act in combination 
to specify the identity of each of the 14 body segments. 

The homeotic genes of animals are also remarkable 
for being clustered in gene complexes. In Drosophila there 
are two homeotic clusters on the third chromosome: the 
Antennapedia complex, consisting of five genes, and the 
bithorax complex, consisting of three genes. In other or- 
ganisms, the homeotic genes are usually in a single cluster. 
Amazingly, the order of the genes within the complexes 
reflects the positions along the anterior—posterior axis that 
are influenced by each gene (Figure 20.10). 

The cloning of the homeotic genes revealed another 
surprise: All eight genes encode closely related proteins, sug- 
gesting that all members of the complex were derived from a 
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Figure 20.10 Hox genes of the Antennapedia and bithorax complexes. 
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common ancestor through a series of gene duplications. All 
of the genes share a conserved sequence of DNA of 180 nu- 
cleotides that was dubbed the homeobox, which encodes a 
60-amino acid protein domain, termed the homeodomain, 
with a helix-turn-helix motif. Such motifs had previously 
been recognized in bacterial and phage transcription factors, 
such as the Lac repressor and the lambda repressor proteins. 
They function to bind cis-regulatory DNA sequences of tar- 
get genes. Since the homeobox genes of the Antennapedia 
and bithorax complexes share both molecular and func- 
tional similarity as well as having a common evolutionary 
origin, they are known collectively as Hox genes. 

The patterns of Hox gene expression correlate with 
the regions affected in the corresponding mutants. Each 
of the Hox genes has a well-defined anterior boundary of 
expression but in most cases a more diffuse boundary on 
the posterior, resulting in overlapping domains of Hox 
gene expression. The anterior boundaries of Hox gene 
expression do not correspond to segmental boundaries 
but rather to boundaries of segment polarity gene expres- 
sion. Thus, Hox gene expression is out of register with the 
groups of cells that give rise to segments in the adult fly 
and instead marks the boundaries of parasegments. 

Because of the parasegmental pattern of Hox gene 
expression, mutations of those genes affect cellular 
identity in a parasegmental manner. Each parasegment of 
the embryo expresses a unique combination of Hox gene 
products, giving each parasegment a specific identity. The 
activation of Hox genes is controlled by the earlier-acting 
gap and pair-rule genes in a combinatorial manner similar 
to that described for the activation of pair-rule genes by 
the gap and coordinate genes. In the absence of all Hox 
gene activity, segments are formed, but they all differenti- 
ate into a “default” state that resembles a head segment. 
This outcome indicates that Hox genes are not required 
for the formation of the segments but rather for the speci- 
fication of their identity. 


The Antennapedia Complex The Antennapedia complex 
consists of five Hox genes—labial, Deformed, Sex combs 
reduced, proboscipedia (Pb), and Antennapedia—that 
act in combination to specify the cephalic and thoracic 
parasegments (see Figure 20.10c). The original Antennapedia 
mutant (see Figure 20.1) was dominant and was found 
to be the result of a gain-of-function allele (see Section 
4.1). The Antennapedia gene is normally expressed only 
in parasegments 4 and 5 (see Figure 20.10c), which give 
rise to thoracic segments that each produce a pair of legs. 
In flies carrying the dominant Antennapedia mutation, 
however, Antennapedia is expressed ectopically—meaning 
it is expressed at an inappropriate time or place or both. 
One of the normal roles of Antennapedia expression in 
the thoracic segments is to promote the differentiation of 
thoracic appendages into legs. When expressed ectopically 
in the third head segment, Antennapedia inappropriately 
promotes differentiation of head appendages (antennae) into 
legs instead. 


The bithorax Complex In contrast to Antennapedia 
mutations that affect anterior body segments, mutations 
in the three genes of the bithorax complex—Ultrabithorax, 
abdominal-A, and Abdominal-B—affect more-posterior 
segments (Figure 20.11a). The bithorax complex genes are 
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Figure 20.11 Cross-regulatory interactions between 
bithorax complex genes, specifying thoracic and abdominal 
segment fates. 
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expressed in overlapping sets of thoracic and abdominal 
parasegments and act in combination to specify the identity 
of those parasegments. How do only three genes specify 
the identity of nine segments, one thoracic and eight 
abdominal? The three genes vary not only in their spatial 
patterns of expression but also in expression levels between 
segments. Each has a sharp anterior border of expression 
and a more diffuse posterior boundary of expression. Thus, 
each segment exhibits a unique qualitative and quantitative 
pattern of Hox gene expression. 

Loss of Ultrabithorax activity results in parasegments 
5 and 6 having a combination of Hox gene products 
resembling that normally found in parasegment 4. This 
causes transformations of the identity of thoracic segment 
T3 and abdominal segment A1 into thoracic segment 
T2 (Figure 20.11b). Loss of the entire bithorax complex 
causes most abdominal segments to develop as T2, so 
each has legs as appendages (Figure 20.11c). This obser- 
vation suggests that expression of Antennapedia, which 
promotes leg identity in appendages, extends posteriorly 
in such mutants and that genes of the bithorax complex 
normally repress posterior expression of Antennapedia. 
Such cross-regulatory interactions between Hox genes, 
whereby more posteriorly expressed Hox genes repress 
the expression of Hox genes normally expressed in more- 
anterior positions, is a common although not universal 
feature in the regulation of Hox genes (Figure 20.11d-e). 

As you have probably noticed, there is no single Hox 
gene called bithorax; so what became of the original bithorax 
(bx) mutation that was isolated by Calvin Bridges? When Ed 


(a) Ultrabithorax gene 


Lewis recognized that mutations such as bithorax could 
provide valuable insights into the genetic mechanisms of 
development, he began collecting mutations with similar but 
distinct phenotypic defects, some of which he called post- 
bithorax (pbx), Contrabithorax, Ultrabithorax, and bitho- 
raxoid (bxd). Each of these mutations mapped to a different 
position in the same chromosomal region, so that they were 
separable by recombination events, and double-mutant 
combinations could be constructed. At the time Lewis per- 
formed these studies, molecular cloning was unknown, and 
he assumed that each mutant he identified represented a 
different gene. When the bithorax complex was eventually 
cloned in 1983, however, many of the mutant phenotypes 
were found to result from mutations in different enhancer 
modules controlling the expression of a single coding region 
that is now called the Lltrabithorax gene (Figure 20.12a). 
Mutations of the regulatory elements can be either re- 
cessive, if in an enhancer module that acts to positively reg- 
ulate gene expression, or dominant, if in a silencer module 
that acts to negatively regulate gene expression. While null 
loss-of-function alleles of Ultrabithorax result in embryo 
lethality, disruption of single enhancer modules results in 
milder defects. For example, recessive Lltrabithorax?!"™ 
mutations (bx) result in the transformation of the anterior 
part of T3 into T2, causing the anterior portion of the haltere 
to develop as a wing (Figure 20.12b). Conversely, recessive 
Ultrabithorax??”""" mutations (pbx) result in the trans- 
formation of the posterior region of T3 into T2 identity, and 
the posterior portion of the haltere develops as a wing. Only 
in the Ultrabithorax?"""™ Ultrabithorax?”"""™ double 
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GENETIC ANALYSIS 


PROBLEM Why do loss-of-function mutations in bithorax complex genes result in | 
homeotic transformations of parasegments into identities that correspond to more-ante- Pa a rE ES 
f rior parasegments, whereas gain-of-function mutations (see Section 4.1) tend to result in Haan a Dh waits i 
identities corresponding to more-posterior parasegments? 
a IT DOWN: Ina homeotic transformation, 


a normal body part is replaced by another body part 
normally found in another region of the body. 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem 
addresses and the nature of the 
required answer. 


1. The subject of this question is the effect of mutations in the bithorax complex 
on segment pattern formation. The answer requires descriptions of why loss- 
of-function mutations lead to segments that resemble more-anterior segments, 
whereas gain-of-function mutations lead to the formation of segments that 
resemble more-posterior segments. 


2. Identify the critical information given 2. The question suggests there is a key difference between the effects of loss- 

in the problem. of-function mutations and gain-of-function mutations of the bithorax complex. 
Deduce 
3. Review the general patterns of ex- 3. Homeotic genes, such as the Hox genes, specify segment identity in a combina- 


pression and segmental pattern 
formation resulting from the normal 
expression of homeotic genes. 


TIP: Use Hox genes as an example of 
a set of developmental genes. 


4. Review the general pattern of 4. 
expression and the normal segmental 
pattern formation of bithorax genes. 


torial manner through overlapping expression domains in parasegments. Each 
gene has a well-defined anterior boundary but a more diffuse posterior bound- 
ary. Cross-regulatory interactions refine Hox gene expression domains, so that 
more-posterior genes repress more anteriorly expressed genes. 


The bithorax complex consists of three genes, Ubx, abd-A, and Abd-B. Ubx is 
expressed in the anterior abdominal segments and posterior thoracic segments, 
abd-A is expressed in the middle abdominal segments, and Abd-B is expressed 
in the posterior abdominal segments. Segment identity is specified by the 
combination of Hox gene products and their levels of expression. 


Solve 


The loss of function of a posterior gene leads to both the absence of expression 
of the mutant gene and posterior expansion in the expression domains of 
more-anterior genes. For example, the posterior gene Abd-B acts to repress 
abd-A in the most-posterior segments. Loss-of-function mutations in Abd-B 
result in a posterior expansion of abd-A expression into more-posterior 
abdominal segments. The result is that both middle and posterior abdominal 
segments acquire an identity that is similar to that of the middle abdominal 
segments—a homeotic transformation to more-anterior identity. 


6. Explain why gain-of-function 6. Gain-of-function mutations cause gene expression at inappropriate times and 
mutations of bithorax genes locations. Gain-of-function alleles often, but not always, result in Hox gene 
lead parasegments to take ona expression in a more-anterior domain than in wild-type animals, thus resulting 
more-posterior identity. in homeotic transformations to a more-posterior identity. 


TIP: Gain-of-function Antennapedia mu- 
tations cause legs (a posterior structure) to 
develop in the position normally occupied 
by antennae (an anterior structure). 


6) Explain why loss-of-function 5. 
mutations of bithorax genes 
lead parasegments to take ona 
more-anterior identity. 


TIP: Consider the cross-regulatory 
interactions of the Hox genes. 


MasteringGenetics”™ 


For more practice, see Problems 6, 7, 22, and 26. Visit the Study Area to access study tools. 


mutant is the identity of the entire T3 segment transformed 
into a T2 identity, causing a four-winged fly to develop (see 
Figure 20.1). 

The cis-regulatory elements of Ultrabithorax span 
over 120 kb (see Figure 20.12a), and their modularity 
allows the evolution of changes in gene expression without 
catastrophic disruption of L[trabithorax function, such as 
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those caused by nonsense mutations within the coding 
region. Thus, Lltrabithorax?“"°'™ Ultrabithorax? sor 
double mutants survive to adulthood because the 
remainder of the cis-regulatory elements controlling 
Ultrabithorax expression are intact. Genetic Analysis 20.1 
asks you to evaluate cross-regulatory interactions among 
Hox genes. 
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Downstream Targets of Hox Genes 


Given that combinatorial action of the Hox genes speci- 
fies parasegment identity and that Hox genes encode tran- 
scription factors, it follows that the downstream target 
genes activated by the Hox genes must differ between 
segments. These Hox target genes have been called 
realizator genes, and their expression contributes to 
the characteristic morphology of each segment. As an 
example, let’s consider the formation of appendages on 
each segment. 

Wild-type flies have antennae on the most-anterior 
head segment and have mandibles and maxillary and 
labial sense organs on other head segments. The three 
thoracic segments have legs; T2 and T3 also have wings 
and halteres, respectively. The eight abdominal segments 
lack appendages. Loss of all Hox activity is lethal to the 
embryo and causes all segments to resemble a head 
segment having antennae as appendages. This outcome 
indicates that all segments have the potential to form an 
appendage, and that expression of Hox genes can either 
specify the appendage identity or repress its formation. 

The formation of an appendage is dependent upon 
a gene called Distal-less. In wild-type Drosophila, Distal- 
less is expressed in the head and thoracic segments but 
not in any abdominal segments. This pattern suggests 
that the abdominal segment identity genes, Ultrabithorax, 
abdominal-A, and Abdominal-B, negatively regulate 
Distal-less expression in the abdominal segments. Loss of 
function of all bithorax complex genes results in ectopic 
Distal-less expression in all abdominal segments, along 
with a concomitant development of appendages (legs) 
on all abdominal segments. Conversely, if Ultrabithorax 
is ectopically expressed at high levels throughout the 
embryo, Distal-less is not activated in any segment and 
no appendages are formed. Thus, action of specific bitho- 
rax complex Hox proteins on Distal-less cis-regulatory 
sequences represses Distal-less gene expression in the 
abdominal segments. The identity of the appendages is 
determined by the combinatorial activity of the Hox genes 
in conjunction with Distal-less. For example, the identity 
of the T1 leg is specified by Distal-less and Sex combs 
reduced, whereas the identity of the T2 leg is specified by 
Distal-less and Antennapedia. 


Hox Genes in Metazoans 


Soon after the discovery of Hox gene clusters in 
Drosophila, researchers began to inquire whether Hox 
genes are a peculiarity of Drosophila development, or 
whether they are found in a broader range of species. 
Many developmental biologists did not expect to find 
Hox genes in other animals, since there was no reason to 
expect that other animals would use the same genes to 
direct very different developmental programs. However, 
cross-hybridization studies using Drosophila Hox 


sequences as molecular probes revealed Hox gene se- 
quences in the genomes of all animals, including insects, 
spiders, molluscs, and vertebrates (such as humans). This 
revelation suggested a common developmental mecha- 
nism among animals. 

Subsequent experiments showed not only that most 
animals have clusters of Hox genes but also that they 
are arranged in a manner similar to that in Drosophila 
(Figure 20.13). Each cluster consists of genes corre- 
sponding to those in the bithorax and Antennapedia 
clusters of Drosophila, with some minor deletions and 
duplications. For example, as in Drosophila, the mouse 
Hox genes are expressed in an anterior-to-posterior pat- 
tern that corresponds to the chromosomal position of 
the genes within the Hox clusters. This pattern suggests 
that Hox genes also specify identity along the anterior— 
posterior axis of the mouse and, by extension, of mam- 
mals in general. 

The conservation of Hox gene clusters among an- 
imals indicates that a common ancestor possessed a 
Hox gene cluster specifying pattern formation along its 
anterior—posterior axis. This cluster was duplicated dur- 
ing the evolution of the vertebrate genome, which has 
four copies. The conservation of the Hox complexes for 
more than 500 million years suggests that the spatial co- 
linearity of Hox genes along the chromosome with their 
expression along the body axis is essential for optimal 
functionality. 

Mice embryos with loss-of-function alleles of Hox 
genes, constructed using gene-targeting techniques de- 
scribed in Chapter 17, exhibit defects in the identity of 
serially repeated structures. For example, loss of Hox func- 
tion results in a homeotic transformation of the lumbar 
and sacral vertebrae, which do not normally bear ribs, into 
structures resembling more-anterior thoracic vertebrae that 
do carry ribs (see Figure 16.1). These and additional Hox 
gene mutations suggest Hox genes direct the development 
of body plans in chordates as well as in annelids, arthropods, 
molluscs, nematodes, and other animals. 

Studies of Hox complexes in other metazoans reveal 
that gene duplication took place before the divergence 
of bilaterian animals (animals that have bilateral symme- 
try). Thus, all bilaterian animals have essentially the same 
homeotic gene toolkit to pattern their anterior—posterior 
axis. This homology indicates that the differences between 
animals reflect how the toolkit is employed rather than 
differences in the component parts. Indeed, large-scale 
sequencing of cnidarian (jellyfish, sea anemone) genomes 
suggests that other components of the genetic toolkit are 
also largely shared by all metazoans. Given that all animals 
share fundamental developmental patterning processes 
and genes, much of what we learn from the study of model 
animals such as Drosophila, Caenorhabditis elegans, and 
mice can be extended to other members of the animal 
kingdom, including ourselves. 


696 


CHAPTER 20 Developmental Genetics 


Bilaterians 


Choanoflagellates (none detected) 


=Nemertean —ii 


©1999 Macmillan Publishers Ltd 


Sponges —il = = = 
Cnidarians —i {ia} 
, lab pb bed b Dtd Sco ltz Amp Ubx abd-A Adb-B 
Fruit fly -EE 
Onychophoran —§§§—8E nf ee m 
Fa 


Nematode (ii Git}? {i#1’"|> 


Priapulid -aA 


Polychaete -EEE OO i i= 
Leeches -i—i 

m O E 

Flatworms -E = 

Gastropod ————— T E 


Brachiopod -E 


Amphioxus —jj}— 


Sea urchin -E {_} 


Figure 20.13 Occurrence and arrangement of Hox complexes in metazoans. Hox genes have not 
been detected in choanoflagellates, single-celled organisms that represent the sister clade to meta- 
zoans, but they are present in all metazoans. In the vertebrate lineage (exemplified by the mouse), 

the entire complex has been duplicated twice, resulting in four Hox complexes. Such events have 
produced duplicated genes that were later co-opted to new developmental functions. 


Stabilization of Cellular Memory by Chromatin 
Architecture 


The preceding sections describe how the basic body plan 
of Drosophila is established in early embryogenesis by the 
action of coordinate, gap, and segmentation genes and 
through spatially restricted patterns of Hox gene expres- 
sion that specify segmental identity. The patterns of Hox 
gene expression are then faithfully propagated through- 
out the remainder of embryonic development. The pro- 
teins that activate Hox gene expression have an ephemeral 
pattern of expression; it disappears soon after Hox expres- 
sion patterns are initiated. Thus, one challenge cells face 
during embryonic development is for specific lineages to 
maintain their identity as they proliferate. 

Genetic screens for homeotic genes revealed that mu- 
tations at loci other than those encoding the Hox genes 
can also produce homeotic mutant phenotypes. In general, 
mutations at these other loci fall into two classes. The first 
class, exemplified by trithorax mutations, produces pheno- 
types reminiscent of multiple Hox loss-of-function muta- 
tions. In contrast, phenotypes of mutants of the second 
class, exemplified by Polycomb mutations, often resemble 
multiple gain-of-function alleles of Hox genes. At the mo- 
lecular level, expression of multiple Hox genes is found to 


be ectopic in Polycomb mutants and reduced in trithorax 
mutants. While Hox gene expression is established nor- 
mally in both Polycomb and trithorax mutants, the ex- 
pression either fails to be maintained (trithorax mutants) 
or is later activated in inappropriate locations (Polycomb 
mutants). Thus, rather than “remembering” what type 
of tissue they are destined to form, mutant trithorax and 
Polycomb cell lineages appear to “forget” their identity. 
Both trithorax and Polycomb encode proteins that act 
in large protein complexes whose function is to modulate 
chromatin structure. Components of the complexes are 
encoded by genes known, respectively, as the trithorax 
group (trxG) genes and the Polycomb group (PcG) genes. 
Both the trxG and PcG protein complexes are recruited 
to specific DNA sequences, and each complex possesses 
a distinct type of histone-3-methyltransferase activity 
(see Section 15.2) in which the activity of the trxG 
complex is opposite to the activity of the PcG complex. 
The PcG complexes repress target gene expression by 
recruiting histone-modifying protein complexes capable 
of histone deacetylation. In contrast, trxG complexes 
recruit protein complexes that acetylate histone, lead- 
ing to maintenance of active gene expression. These two 
types of modification are associated with transcription- 
ally inactive heterochromatin and transcriptionally active 


euchromatin, respectively (see Chapter 15). It is believed 
that trxG and PcG complexes are recruited to the cis- 
acting regulatory sequences of Hox genes to “lock” the 
chromatin into a particular form, allowing maintenance 
of either active or silent states of gene expression. In this 
way, these proteins provide a type of epigenetic cellular 
memory that is propagated through cell divisions occur- 
ring long after the initial activators of Hox gene expres- 
sion patterns have disappeared. 

Study of trithorax and Polycomb mutants has helped 
clarify that the establishment of euchromatic or hetero- 
chromatic chromatin at specific developmental genes is 
a primary mechanism by which the potential fates of 
cells become restricted as development proceeds from 
totipotent zygote to differentiated cell types. The relative 
rigidity or plasticity of these different chromatin states 
is directly responsible for a cell’s ability to express some 
genes and not express others, thus influencing the devel- 
opmental potential of particular cell types. 


20.3 Cellular Interactions 
Specify Cell Fate 


The adult C. elegans only contains about 1000 cells, and 
its development provides a model of organogenesis. For 
example, the development of the Caenorhabditis elegans 
vulva provides an example of how inductive and inhibitory 
signals between cells direct the differentiation of distinct 
developmental fates in a group of pluripotent cells. John 
Sulston, Sydney Brenner, and Robert Horvitz shared the 
Nobel Prize in Physiology or Medicine in 2002 for their re- 
search on the genetic regulation of organ development and 
programmed cell death in C. elegans. 


Inductive Signaling between Cells 


Caenorhabditis elegans is a hermaphrodite nematode 
worm in which external genitalia, the vulva, forms a portal 
to the uterus through which eggs are laid. Early in their 
development, hermaphroditic worms produce sperm, 
which they store for later use. Eggs are subsequently pro- 
duced in the gonads, fertilized with the stored sperm, and 
then extruded through the vulva. The vulva forms during 
the last larval stage, from six precursor cells called vulval 
precursor cells (VPCs); see Figure 20.14a-b. Three of 
these larval cells give rise to structures of the vulva itself: 
One is called the primary (1°) cell, and the other two are 
called secondary (2°) cells of the vulva. The other three 
cells differentiate into hypodermis and are called tertiary 
(3°) cells. The VPC closest to a specific gonadal cell called 
the anchor cell differentiates as a 1° cell and forms the 
central part of the vulva. The two cells flanking the 1° cell 
differentiate as 2° cells and form the peripheral regions of 
the vulva. The 1° and 2° fates can be easily distinguished 
by their distinct cell-division patterns. 
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(a) Six cells, P3. to P8.p, have potential to develop into vulva. 
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( P3p P4p P5p | P6p| P7p | P8.p) 


Vulval precursor cells (VPCs) 


lin-3 expression in anchor cell 


Vulval precursor cells (VPCs) 


(b) The three cells closest to anchor cell—P5.p to P7.p—form 
the vulva; the other cells develop into hypodermis. 


One cell has 1° identity and forms the central part; two 
flanking cells adopt 2° fate and form peripheral parts. 


(c) Loss of the anchor cell results in loss of vulval development; 
all cells adopt hypodermal fate. 
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differentiation. 
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Figure 20.14 Inductive signaling during vulval develop- 
ment in C. elegans. 


Initially, each of the six VPCs has the potential to dif- 
ferentiate along any of the pathways—1°, 2°, or 3°. This 
flexible cell-fate potential is demonstrated by laser-ablation 
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experiments that destroy the anchor cell or one or more 
VPCs (Figure 20.14c). If the anchor cell is destroyed, no 
vulva will form, because all six VPCs differentiate with a 
3° fate and become hypodermis. This suggests that the an- 
chor cell must be present to induce VPCs to differentiate 
with 1° or 2° fates and thus form the vulva. Alternatively, 
if the VPC closest to the anchor cell is ablated, one of the 
cells that would normally differentiate with a 2° fate in- 
stead develops with a 1° fate and the two cells flanking this 
new 1° cell differentiate as 2° cells, suggesting that any of 
the VPCs can differentiate with a 1° or 2° fate. 

What limits the number of VPCs destined to form the 
vulva to three? Given the loss of both the 1° and 2° fates 
when the anchor cell is removed, researchers hypothesized 
that the anchor cell might provide an inductive signal to 
induce vulval cell differentiation (Figure 20.14d). If this in- 
ductive signal is disseminated in a gradient, the cell closest 
to the anchor cell could acquire a different fate than cells 
that are more distant. 

As predicted by the inductive interaction model, 
mutations that eliminate either the inductive signal or the 
ability of cells to respond to the inductive signal result in 
a loss of vulval development, and all VPCs differentiate 
as hypodermis (Figure 20.15a). This mutant phenotype 
is called the vulva-less phenotype. In contrast, mutations 
that disseminate the inductive signal to all VPCs cause all 
VPCs to differentiate into vulval cells, producing a multi- 
vulva phenotype. Multi-vulva mutants lay eggs similarly 
to normal worms; however, the fertilized eggs of vulva- 
less worms cannot be laid and instead develop and hatch 
inside the mother’s uterus. Progeny developing in the 
uterus eventually consume their mother from the inside 
and then hatch out of the carcass. 

Recessive loss-of-function alleles at several loci pro- 
duce a vulva-less phenotype. These genes encode proteins 
that act either in the production of the inductive signal 
from the anchor cell or that facilitate cell response to the 
inductive signal (Figure 20.15b). For example, the lin-3 
gene encodes a small, secreted protein expressed only in 
the anchor cell and acting as the inductive signaling mol- 
ecule (see Figure 20.14a and d). Mutations that result in a 
loss of active LIN-3 protein result in the loss of the induc- 
tive signal from the anchor cell. In contrast, the /et-23 and 
let-60 genes are expressed in the VPCs and act as the recep- 
tor (LET-23) for the lin-3—-encoded signal and as a signal 
transduction molecule (LET-60) that communicates the 
signal from the plasma membrane to the nucleus, where 
changes in gene expression are induced. The absence of a 
receptor for LIN-3, or the inability to transmit receipt of 
the signal, blocks the normal developmental fate of VPCs. 

Epistatic analysis of developmental pathways, con- 
ducted by studying multiple mutant combinations, is used 
to identify groups of genes that interact to control a par- 
ticular cellular process or pathway and to establish an 
order-of-function map for the genes in the pathway (see 
Section 4.3). Genetic analysis of developmental pathways 
can be more complicated than analysis of biochemical 
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Figure 20.15 Genetic analysis of vulval development in 
C. elegans. 


pathways because often there is no way of assaying inter- 
mediate steps in the developmental pathway. The analysis 
of double mutants and the availability of gain-of-function 
alleles can be crucial in these endeavors, as the studies 
of vulva-less and multi-vulva mutants in C. elegans show 
(Figure 20.16). In the case of recessive loss-of-function 
alleles of lin-3, let-23, and let-60, all single mutants have 
the same phenotype, suggesting all these genes might act 
in the same pathway. However, all double-mutant loss-of- 
function combinations also exhibit a vulva-less phenotype 
(Figure 20.16b), which complicates the effort to discover 
the order of genes in the pathway. 

As shown in Figure 20.15, genetic screens of C. elegans 
identified dominant multi-vulva mutations in which all 
VPCs differentiated as 1° or 2° cells. Two of the dominant 
mutations mapped to the same positions as let-23 and let- 
60, suggesting that they might be gain-of-function alleles 
of these genes, and both dominant mutant alleles proved 
to be epistatic to recessive loss-of-function alleles of lin-3 
(ie., the double mutants have a multi-vulva phenotype 
like the let-23 and let-60 gain-of-function single mutants), 
as outlined in Figure 20.16e—f. The double-mutant phe- 
notype indicates that the gain-of-function alleles of either 


(a) Wild type 


(b) lin-3 loss-of-function (or 
let-23 or let-60 loss-of-function) 
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(c) let-23 gain-of-function (d) /et-60 gain-of-function 


Normal Vulva-less 


(e) /in-3 loss-of-function (f) lin-3 loss-of-function 
+ + 
let-23 gain-of-function let-60 gain-of-function 


Multi-vulva Multi-vulva 


(g) /et-23 loss-of-function (h) /et-60 loss-of-function 
+ + 
let-60 gain-of-function let-23 gain-of-function 


Multi-vulva 


Multi-vulva 


Multi-vulva Vulva-less 


Figure 20.16 Analysis of double-mutant phenotypes to find order of genes in developmental 
pathways. (a) In wild-type worms, the vulva developmental pathway is active only in the presence 

of the signal (LIN-3). (b) In /in-3 mutants, no signal is present, and worms develop with a vulva-less 
phenotype. (c) and (d) In either /et-23 or let-60 gain-of-function alleles, the pathway is constitutively 
active, and worms develop with a multi-vulva phenotype. (e) and (f) Gain-of-function alleles of let-23 
and let-60 are epistatic to loss-of-function /in-3 alleles. The pathway is constitutively active regardless 
of whether the /in-3 signal is present. (g) and (h) Gain-of-function alleles of let-60 are epistatic to loss- 
of-function alleles of let-23. Conversely, loss-of-function alleles of /et-60 are epistatic to gain-of-function 


alleles of let-23. This places /et-60 downstream of /et-23. 


let-23 or let-60 do not require the function of lin-3 to ex- 
ert their phenotypic effects, thus placing both /et-23 and 
let-60 downstream of lin-3. 

Similar analysis enables the ordering of the let-23 
and let-60 genes in the pathway (see Figure 20.16g—h). 
Dominant let-60 alleles are epistatic to recessive /et-23 al- 
leles, indicating that /et-60 can function in the absence of 
functional Jet-23, a finding that places let-60 downstream 
of let-23. This conclusion is supported by the converse 
experiment, where recessive /et-60 alleles are epistatic to 
dominant /et-23 alleles, which indicates that let-23 re- 
quires the function of Jet-60 to exert a phenotypic effect. 


The genetic pathway was determined before the nature 
of the proteins had been analyzed. Now that we know the 
molecular identities of LIN-3 (signal), LET-23 (receptor), 
and LET-60 (signal transduction molecule), these epistatic 
relationships make sense. For example, dominant gain-of- 
function mutations of /et-60 result in constitutive activity 
of this protein, allowing it to transduce a signal indepen- 
dent of the state of the LET-23 receptor. Likewise, gain-of- 
function alleles of let-23 act as if they are receiving a signal 
all the time, whether or not /in-3 is functional, and thus ac- 
tivate the downstream signal-transduction cascade, which 
in turn depends on having a functional allele of let-60. 
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Lateral Inhibition 


Given that they are both induced by the /in-3—encoded 
signal, how are the 1° and 2° fates specified? One possibil- 
ity is a differential response of the VPCs to a graded lin-3 
signal, where the highest concentration of signal produces 
a 1° fate and a lower concentration of signal produces 
2° cells. However, when the cell that would normally be 
a 1° cell is ablated, a cell that would normally have been a 
2° cell differentiates into a 1° cell instead. It is thus un- 
likely that the absolute concentration of signal perceived 
is solely responsible for directing cell fate. 

A possible explanation is that after reception of 
the lin-3 signal, a second signal is sent from the 1° cell 
that inhibits the neighboring cells from becoming 1° 
cells (Figure 20.17a). This process is termed lateral 
inhibition, where an initial asymmetry is reinforced 
by signalling between adjacent cells (Figure 20.17b). All 
VPCs initially have the potential to express a lateral 
signal, encoded by the lag-2 gene, and to express the re- 
ceptor for the LAG-2 signal, encoded by the lin-12 gene. 
The lag-2 gene is activated in response to the LIN-3 
signal, so it is expressed at higher levels in the 1° cell. 
Reception of LAG-2 results in down-regulation of the 
lag-2 gene in the receiving cells and up-regulation of the 
gene for its receptor, LIN-12 (Figure 20.17c). This creates 
a feedback loop that reinforces the initial asymmetry 
between the 1° and 2° cells. Continued feedback between 
the signal and its perception amplifies the differences 
between the two cells, causing them to acquire distinct 
developmental fates. 


Cell Death during Development 


One of the striking observations made when Sulston, 
Brenner, and Horvitz tracked the fate of every cell dur- 
ing C. elegans development is that many cells are fated to 
die. Of the 1090 cells produced during the development 
of a hermaphrodite worm, 131 cells undergo a process 
called programmed cell death, or apoptosis (introduced in 
Sections 3.1 and 12.5). 

Because the fate of every cell in C. elegans development 
is known, researchers have been able to identify mutants in 
which a cell fails to undergo apoptosis. Genetic analyses 
of such mutants have elucidated a genetic pathway that 
leads to cell death in response to a signaling molecule. This 
pathway is largely conserved across the animal kingdom 
(in humans, as well) and is a natural and important pro- 
cess that helps sculpt the development of tissues as well as 
maintain tissues in adult organisms. Indeed, it is estimated 
that 10" cells are programmed to die every day in an adult 
human, many of them in epithelial tissues such as skin and 
intestine. While loss-of-function mutants for genes in the 
apoptosis pathway are viable in C. elegans, loss-of-function 
mutations in homologous genes in mice result in embryo 
death, indicating that cell death is an essential part of life 
in mammals. 
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Figure 20.17 Lateral inhibition in C. elegans vulval 
differentiation. 


20.4 “Evolution Behaves Like a Tinkerer” 


One of the major surprises emerging from genome se- 
quence analysis of animals is that, within a factor of about 
2, most animal genomes have very similar numbers of 


genes. The range is from about 12,000 to about 25,000. 
Thus relatively simple animals such as Drosophila have 
a genome containing about 14,000 genes, whereas the 
human genome contains about 25,000 genes. Even organ- 
isms such as jellyfish and sea anemones possess genomes 
with gene numbers largely similar to those of vertebrates. 

Given this consistency of gene number, what is the 
biological explanation of how the presumed “complexity” 
of vertebrates is produced from a genetic toolkit that is 
similar to the one possessed by comparatively “simple” an- 
imals? The answer seems to lie in the relative complexity 
of gene regulation rather than the invention of new genes 
for additional developmental processes. This proposal sug- 
gests that existing genes are recruited for new roles by 
means of changes in their regulation, both in space and 
time. Biologist Francois Jacob summed up this view of evo- 
lution when he said, “Evolution behaves like a tinkerer.... 
[It] does not produce novelties from scratch. It works on 
what already exists, either transforming a system to give it 
new functions or combining several systems to produce a 
more elaborate one.” 

A common theme in the evolutionary history of all 
genes, and particularly those influencing development, is 
the co-option of genes and genetic modules to direct the 
patterning or growth of novel organs. In this section, we 
consider an example of the co-option of genes by evolu- 
tionary “tinkering” to form newly evolved structures: dig- 
its (fingers and toes) on tetrapod limb appendages such as 
hands and feet. The study of the evolution of development 
is often referred to as evo-devo. 


Evolution through Co-option 


Limb positioning in tetrapods (four-legged vertebrates) 
results in large measure from the expression of Hox genes 
that direct the anterior—posterior organization of the 
body. Work on chickens and mice, demonstrates that ex- 
pression of Hox genes along the anterior—posterior body 
axis defines the position at which a limb will develop. 
The anterior limit of the expression domains of two Hox 
genes, Hoxc8 and Hoxc6, demarcates the position of the 
forelimb, and the posterior limit of expression marks the 
position of the hindlimb (Figure 20.18a). The expression 
of these two genes specifies the thoracic region of ver- 
tebrates, which is characterized by the formation of ribs 
from the vertebral column. 

Once limb positions are specified, cells of the mes- 
enchyme (loosely connected sub-ectodermal cells) send 
a signal to the overlying ectodermal cells. This signal 
promotes changes within a narrow band of cells that then 
forms the apical ectodermal ridge (AER), whose primary 
function is to direct limb-bud outgrowth by responding 
to signals produced in a group of mesenchymal cells to- 
ward the posterior side of the limb bud called the zone 
of polarizing activity (ZPA; Figure 20.18b). The ZPA 
acts as an organizer that promotes digit formation at the 
distal ends of limb buds (that is, the ends farther from 
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the center of the body) through the production of a mor- 
phogen, a small secreted signaling protein called Sonic 
hedgehog (Shh). The Sonic hedgehog (Shh) gene is or- 
thologous to the Drosophila segment polarity gene hedge- 
hog. Sonic hedgehog is expressed principally in the neural 
tube, where it helps organize the brain, eyes, and other 
structures through patterning of a group of cells known 
as the floor plate, and in developing limbs, where it directs 
the development of digits. The Case Study in this chapter 
discusses the consequences of different Shh mutations on 
mammal development and morphology. 

All extant tetrapods are characterized by five or 
fewer digits in each set, and each digit in the set has a 
unique identity. Tetrapod digits arise along the anterior— 
posterior axis of the limb bud. If you allow your arms to 
hang straight down, you will see that your thumb (digit 1) 
is in the anterior position on your hand, while your pinky 
(digit 5) is in the posterior position. Sonic hedgehog ex- 
pressed in the ZPA plays an important role in initiating 
digit formation, and loss-of-function alleles of Shh result 
in a loss of digits 2-5; only digit 1 forms independently of 
Shh function. A second role of Shh in limb patterning is 
in the specification of digit identity. Experiments where a 
second ZPA is transplanted to an anterior position result 
in a mirror-image duplication of digits, suggesting that 
the ZPA instructs those digits closer to the ZPA to differ- 
entiate with posterior identity (see Figure 20.18b). 

The Hox genes that play a conserved role in patterning 
the anterior—posterior axis in animals were considered can- 
didates to be the genes acting downstream of Shh to specify 
the patterning events in digits. In mice (and by inference 
humans), five Hox genes are expressed in the limb bud at 
the time and place where the digits are developing: Hoxd9, 
Hoxd10, Hoxd11, Hoxd12, and Hoxd13 (Figure 20.18c). 
These genes are also expressed in the posteriormost regions 
of the mouse embryo, where they contribute to patterning 
along the anterior—posterior body axis, and later in the de- 
veloping nervous system. Despite the difference in position 
of hindlimb and forelimb along the body axis, the same five 
Hox genes are expressed in the developing digits of each 
limb. Their expression in the limb bud follows a precise 
temporal and spatial pattern and is dependent on Shh 
activity. The first gene to be expressed is Hoxd9, followed 
by Hoxd10, then Hoxd11, and so on through Hoxd13. 
Spatially, all genes share the same posterior boundary, but 
the anterior boundary of expression is different for each 
gene. Consequently, the five Hoxd genes subdivide the limb 
bud into five zones, each specified by a different combina- 
tion of Hoxd gene expression. Analogous to patterning 
along the anterior—posterior axis, ectopic expression of 
different Hoxd genes within the developing limb bud results 
in transformations of digit identity. A similar combinato- 
rial code of Hox gene expression also appears to specify the 
proximal-—distal patterning of the limb buds themselves 
(e.g., upper arm, forearm, hand, digits). 

Mutations that expand or increase Shh expression 
result in extra digits and have been documented in mice, 
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Figure 20.18 Limb-position and digit determination. 


chickens, dogs, cats, and humans. However, because iden- 
tity is controlled by only five Hox genes, the extra digits 
always have a morphology closely resembling that of an 


5 Hoxd9, 10, 
11,12+13 
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11+12 
al he a Hoxa9, 
10+11 
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adjacent digit, rather than having a unique identity (see 
Figure 4.13). Finally, it is worth noting that the separa- 
tion of the human limb bud into individual digits requires 
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programmed cell death (see Section 20.3) of the interven- 
ing cells—a process that has been lost in duck and bat 
limbs and has led to webbing in those animals. 

These programs have been further modified during 
evolution in the secondary loss of legs in snakes and 
cetaceans. The loss of the front legs of snakes is due to an 
anterior shift in both Hoxc6 and Hoxc8 gene expression all 
the way to the base of the head. All vertebrae behind the 
snake head, except the first one, develop as thoracic verte- 
brae with ribs. In contrast, the convergent evolution of loss 
of hind legs in snakes and cetaceans is due to independent 
alterations in Shh activity in the developing hind limb bud. 


Constraints on Co-option 


The ancestral roles of Hoxd genes pertained to patterning 
along the anterior—posterior axis of the body. Therefore, 
the role of Hoxd genes in specifying digit identity represents 
a co-option of function of already existing genes. These 
same genes also acquired roles in the later differentiation of 
the nervous system. Likewise, the presence of the floor plate 
in all vertebrates is an indication that the floor plate evolved 
before limbs during vertebrate evolution. Limbs developed 
later within the tetrapod lineage, and in the course of limb 
evolution, Shh was co-opted to pattern digits, structures 
that did not previously exist. By what process are genes co- 
opted for new functions during evolution? 

In the case of limb evolution, genes of the Hoxd clus- 
ter could have come under control of limb-specific en- 
hancer modules leading to expression of the Hoxd genes 
in developing limbs. As long as changes in regulation did 
not disrupt Hoxd expression during anterior—posterior 
patterning of the body axis, the changes would not result 
in defects of this earlier process. The acquisition of gene 
expression in the developing limb could be thought of as a 
gain-of-function mutation. The modularity of enhancers 
and silencers facilitates evolution by co-option because 
individual enhancer modules are free to evolve indepen- 
dently. Thus the patterning of a novel tetrapod organ, the 
limb, involved the co-option of, or tinkering with, preex- 
isting genetic programs that already had developmental 
roles elsewhere. As noted above, a major constraint on 
this type of evolutionary change is that the more ancestral 
functions of the gene must not be disrupted. 


20.5 Plants Represent an Independent 
Experiment in Multicellular Evolution 


Multicellularity has evolved independently many times in 
the history of life on Earth. The two lineages of multicel- 
lular organisms you are likely to be most familiar with are 
animals and land plants. Since the common ancestor of 
plants and animals was a single-celled organism, multicel- 
lularity evolved independently in each lineage. 

Due to their independent origins, animals and plants 
differ in certain crucial aspects of their development. One 


difference is that germ-line cells in animals separate from 
somatic (body) cells much earlier in development than 
do the germ-line cells in land plants. Another difference 
is that animal cells are often motile during develop- 
ment, whereas plant cells are encased in a cell wall that 
essentially fixes them in the location at which they arise. 
Animals and land plants also differ with respect to when 
the basic form of the body plan takes shape. The ani- 
mal body plan is established during embryogenesis, and 
subsequent development consists primarily of growth in 
size but without the addition of new organs. In contrast, 
throughout their lifetimes plants add new organs that are 
produced from pluripotent stem-cell populations. Finally, 
because plants often grow in a fixed location and are un- 
able to migrate as many animals can, a plant must be able 
to alter its develop mental program in response to chang- 
ing environmental conditions throughout its lifetime. 
Thus, while identical twins in animals are nearly indistin- 
guishable, genotypically identical plants may develop to 
look very different depending upon their growth environ- 
ment. Despite these differences, developmental processes 
occurring in plants are remarkably similar to those in 
animals, especially in their reliance upon the coordinated 
action of transcription factors and signaling molecules. 


Development at Meristems 


Plant development occurs at organized groups of plu- 
ripotent cells called meristems. The two functions of 
meristems are generation of organs and self-maintenance 
(to ensure that a pool of stem cells is always present). The 
above-ground parts of a plant are produced by shoot meri- 
stems and the below-ground parts by root meristems. The 
shoot meristem is divided into three functional domains— 
a peripheral zone from which leaves are formed, a rib zone 
from which part of the stem is derived, and a central zone 
that acts as a stem-cell reservoir to replenish cells lost to 
the developing leaves and stem (Figure 20.19). Meristems 
are generally indeterminate—that is, they can remain ac- 
tive for years, or in some cases the entire life of the plant. 
For example, the shoot meristem at the top of a pine tree 
can be active for centuries, continually producing leaves 
and side branches. Over time, the sizes of the central 
and peripheral domains remain remarkably constant. It is 
the continual production of new organs from meristems 
throughout the life of a plant that allows plants to adjust 
and adapt to changing local environmental conditions. 

The identity of the meristem determines what types 
of organs are produced from its periphery. Early in the life of 
a flowering plant, leaves are produced from the flanks of 
the shoot meristem, and roots are produced from the root 
meristem. At the upper side of the attachment point of the 
leaf to the stem an axillary meristem is formed, from which 
a branch can arise. This reiterative formation of meristems 
that produce leaves that produce branches containing meri- 
stems forms the basis of most aboveground development of 
flowering plants. In response to appropriate environmental 
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Figure 20.19 Shoot meristems in plant growth. 


conditions, the identity of meristems can change. For exam- 
ple, shoot meristems, which have been producing leaves, are 
converted in response to seasonal changes into reproductive 
meristems. A reproductive meristem may either develop 
directly into a flower meristem, or alternatively into an in- 
florescence meristem that produces flower meristems—an 
inflorescence being a group of flowers. In turn, flower meri- 
stems produce floral organs from their peripheral zones. 
Unlike the other meristems, flower meristems are determi- 
nate: no more stem cells are available after it has produced a 
fixed number of organs. 

Because each type of meristem is characterized by 
a specific pattern of gene expression, mutations in key 
genes can result in homeotic transformations of meristem 
types. We have all eaten one such mutant, cauliflower, 
in which meristems that would normally be specified as 
flowers behave instead as inflorescence meristems (see 
Figure 20.19, lower right). The genetic basis of this pheno- 
type has been identified in Arabidopsis as loss-of-function 
alleles of two closely related paralogs, APETALA1 and 
CAULIFLOWER, encoding transcription factors. 


Combinatorial Homeotic Activity in Floral- 
Organ Identity 


Several flowering plant species have been adopted as mod- 
els for the study of genetics. For example, peas (Pisum sati- 
vum), with which Mendel performed his experiments, and 
maize (Zea mays), in which transposons were discovered, 


Inflorescence meristem (im) 
producing flower meristems (fm) 


Arabidopsis thaliana 


apetala 1 cauliflower double mutant: 
homeotic conversion of flower 
meristems into inflorescence meristems 


were introduced in earlier chapters. Due to its small size, 
short generation time, and fully sequenced genome, the 
most widely used model plant is Arabidopsis thaliana. 
Since the 1980s, study of homeotic mutants in Arabidopsis 
and another plant species, Antirrhinum (snapdragon), has 
led to insights into the genetic basis of flower development 
and revealed developmental parallels with animals. 

Arabidopsis flowers are composed of four concentric 
whorls of organs (Figure 20.20). The outermost whorl is 
occupied by sepals, organs that protect the flower bud 
during development. The second whorl is occupied by 
petals, which in many species attract pollinators. Stamens, 
the male organs that produce pollen, are located in the 
third whorl, and the female organs—carpels, containing 
the ovules—occupy the central whorl. 


Homeotic Floral Mutants of Arabidopsis Recessive 
floral homeotic mutants of Arabidopsis fall into three 
classes, each having defects in two adjacent whorls (see 
Figure 20.20). One class, named the A class, exhibits 
homeotic transformations in the outer two whorls, where 
carpels develop in the positions normally occupied by 
sepals and stamens replace petals, so that the four floral 
whorls consist of carpels, stamens, stamens, and carpels 
(see Figure 20.20). A second class, the B-class mutants, 
exhibit homeotic transformations in the middle two 
whorls, where sepals replace petals and carpels replace 
stamens, so that the four whorls consist of sepals, sepals, 
carpels, and carpels. In C-class mutants, homeotic 
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Figure 20.20 Floral homeotic mutations in Arabidopsis. 


transformations in the third and fourth whorls result in 
flowers where petals develop in the positions normally 
occupied by stamens, and the cells that would normally 


give rise to the carpels behave as if they were another 
flower meristem that reiterates the developmental cycle. 
Similar mutants can be found in a number of ornamental 
plant species and are often referred to as “double flowers.” 

In Arabidopsis, A-class activity is promoted by two 
genes, APETALA2 and APETALAI, B-class activity by the 
APETALA3 and PISTILLATA genes, and C-class activity by 
the AGAMOUS gene. Double mutants either display an ad- 
ditive phenotype (e.g., apetala3 agamous flowers consisting 
of only sepals) or exhibit novel phenotypes (e.g., apetala2 
agamous flowers with novel floral organs that do not exist 
in wild-type flowers). Additive double-mutant phenotypes 
suggest that the two genes do not interact, whereas nonad- 
ditive double-mutant phenotypes suggest that the two genes 
interact to influence a common developmental pathway. For 
example, in apetala2 agamous flowers, the first and fourth 
whorls have leaf-like carpels while the second and third 
whorls are occupied by organs with features of both petals 
and stamens. The agamous mutation has a phenotype effect 
in the first and second whorls in an apetala2 background 
(compare the identities of these whorls in an apetala2 single 
mutant to a apetala2 agamous double mutant), an effect not 
observed in a wild-type background, where phenotypic de- 
fects of agamous are limited to the third and fourth whorls. 
This indicates that AGAMOUIS is ectopically active in first 
and second whorls in apetala2 mutants. Likewise, based on 
the double-mutant phenotype, APETALA2 is active in the 
inner whorls of agamous mutants. 

On the basis of single and multiple mutant phenotypes, 
a model was formulated in which the identity of organs 
developing in any whorl is determined by the combina- 
tion of homeotic genes active in that whorl (Figure 20.21). 
It was presumed that each class of gene is active in those 
whorls affected in the respective mutants: APETALA2 
and APETALAI in the outer two whorls, APETALA3 and 
PISTILLATA in the middle two whorls, and AGAMOUS 
in the inner two whorls. Thus, each whorl is character- 
ized by a different combination of homeotic gene activity 
that specifies floral organ identity. The A-class activity by 
itself in the first whorl specifies sepals, A-class + B-class in 
the second whorl specifies petals, B-class + C-class in the 
third whorl specifies stamens, and C-class by itself in the 
fourth whorl specifies carpels. To account for the mutant 
phenotypes (specifically the apetala2 agamous mutant 
described above), a second postulate of the model is that 
the A-class and C-class activities are mutually antagonistic, 
so that in an A-class mutant background, C-class activity 
is found in all four whorls; and conversely, in a C-class 
mutant background, A-class activity is in all four whorls. 
The specification of identity by combinations of homeotic 
gene activities and cross-regulatory interactions between 
the floral homeotic genes is reminiscent of specification of 
segmental identity in Drosophila by Hox genes. 

The model successfully predicts the phenotypes of 
multiple mutants. For example, in a double mutant in 
which both B-class and C-class activities are absent, only 
A-class genes are expressed in all four whorls, and a flower 
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Figure 20.21 The ABC model of flower development. 


with only sepals develops (see Figure 20.20). In ABC triple 
mutants, in which all floral-organ-identity gene activity 
is compromised, leaf-like organs are found in all whorls. 
These observations suggest that since floral organs are evo- 
lutionarily derived from leaves, one role of the floral homeo- 
tic genes is to modify a leaf into a specialized floral organ. 


Homeotic MADS Box Transcription Factors As do 
animal homeotic genes, many floral homeotic genes encode 
closely related transcription factors. However, rather than 
encoding homeobox genes, the floral homeotic genes 
encode MADS box genes, named after the DNA-binding 
domain of the transcription factors. The name MADS box 
is derived from four members of the gene family: MCM1 
of Saccharomyces cerevisiae, AGAMOUS of Arabidopsis, 
DEFICIENS of Antirrhinum, and SRF of humans. All of the 
B- and C-class genes, as well as APETALA1, encode MADS 
boxes. Consistent with the model described above, the 
B-class genes are expressed in whorls two and three, and 
the C-class gene, AGAMOUS, is expressed in the third and 
fourth whorls (see Figure 20.21). 

Subsequent studies have shown that the ABC classes of 
MADS box proteins interact with another class of MADS 
box protein encoded by the SEPALLATA (SEP) genes 
(see Chapter 16 Case Study). The SEP proteins together 
with the A-, B-, and C-class proteins form higher-order 


complexes that regulate transcription (see Figure 20.21). 
The SEP proteins provide a transcriptional activation activ- 
ity to the complexes, an activity that the B and C proteins 
lack. Conversely, the A, B, and C proteins provide speci- 
ficity to the complexes, an activity the SEP proteins lack. 
When A-, B-, or C-class genes are ectopically expressed 
throughout the flower meristem, they cause homeotic 
transformations of floral organ identity. For example, if 
B-class genes are ectopically expressed throughout the 
flower, the result is a flower with organ identities of petal, 
petal, stamen, stamen, from the first to the fourth whorls. 
In contrast, ectopic expression of the A-, B-, and C-class 
genes alone is not sufficient to convert the leaves of the 
Arabidopsis plant into floral organs. However, if the SEP 
genes are ectopically expressed in addition to, for example, 
the A and B genes, the combination is sufficient to convert 
leaves into petals. In this manner, the identities of leaves 
and floral organs are interconvertible by the absence or 
presence of the expression of the floral homeotic genes, 
consistent with floral organs evolving by modification of an 
ancestral leaf. 

Studies of B- and C-class genes from flowering plants 
and gymnosperms (e.g., conifers) suggest that for all seed 
plants, C-class genes alone promote female reproduc- 
tive development and that B + C gene activity promotes 
male reproductive development. However, unlike the 


GENETIC ANALYSIS 


PROBLEM You are interested in the development of the body plan of kelp, a common 
brown alga found along many coastlines. Would reverse or forward genetics approaches 
be more suited to identifying the genes required for early kelp development? 

l BREAK IT DOWN: Review Figure 19.18 to find the | 


BREAK IT DOWN: Ina “forward genet- 
ics” approach, no prior knowledge of gene 
identity is required, while a “reverse genetic” 
approach starts with known gene sequences. 


relationship between brown algae and the other organisms 


you have been studying. 
Solution Strategies Solution Steps 
Evaluate 


1. This problem concerns the investigation of genes determining development 
addresses and the nature of the of kelp. Devising an answer requires evaluating the relative potential of reverse 
required answer. genetic analysis versus forward genetic analysis (see Chapters 16 and 17). 


2. Identify the critical information given 2. Kelp is identified as brown algae, a form of life distinct from land plants and 


1. Identify the topic this problem 


in the problem. animals. 
Deduce 
Determine if looking for gene 3. Examination of Figure 18.11 indicates that kelp is only distantly related to either 


land plants or animals. Therefore, searching for brown algal genes based on the 
sequences of plant or animal developmental genes is something of a fishing 
expedition. 


homology (a reverse genetic 
approach) has a high probability 
of successfully identifying 
developmental genes in kelp. 


TIP: Was the common ancestor of animals, 
plants, and kelp unicellular or multicellular? 
Review Figure 18.11 


PITFALL: Distantly related organisms are likely to have evolved sub- 
stantially since they last shared a common ancestor, and the extent of gene 
homology decreases as evolutionary distance between species increases. 


Solve 


4. A good approach to finding developmental genes is to perform a mutagenesis 
experiment that will identify mutants in which pattern formation is perturbed. 
Mutagenesis can potentially affect any gene; thus, the forward genetics ap- 
proach is not biased or restricted to genes that share homology with genes in 
other species. Mutants displaying abnormalities of wild-type pattern formation 
are likely to carry mutations of pattern-forming genes. 


4. Determine whether the use of mu- 
tagenesis (a forward genetics ap- 
proach) is likely to help identify kelp 
developmental genes. 


TIP: How were genes that regulate develop- 
ment in Drosophila originally identified? 


For more practice, see Problems 17, 19, 23, 25, and 28. Visit the Study Area to access study tools. 
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in a similar manner via the combinatorial action of closely 
related transcription factors. Although the mechanism of 
developmental patterning in plants and animals is similar, 
the genes involved in development in the two kingdoms are 
not related; this is consistent with the independent evolu- 
tion of multicellularity in plants and animals. 

Genetic Analysis 20.2 asks you to design an experi- 
mental strategy to genetically dissect development in 
another group of multicellular eukaryotes. 


Hox genes, which appear to have evolved at the base of 
the animal lineage and which control patterning in all 
known animals, the B- and C-class genes are unknown in 
earlier-diverging lineages of land plants, such as ferns, ly- 
cophytes, and bryophytes, whose reproductive structures 
differ substantially in morphology and development and 
whose leaf-like organs evolved independently. 

We have seen that the specification of serially repeated 
structures in both Drosophila and Arabidopsis is controlled 


CASE STUDY 


Cyclopia and Polydactyly—Different Shh Mutations with Distinctive Phenotypes 


Sonic hedgehog (Shh), introduced in Section 20.4, is an evolu- 
tionarily conserved gene that performs multiple related but 
distinctive roles in developing tissues of animals. The gene’s 
best-understood developmental roles, stemming from its ex- 
pression in limb buds and in the neural tube, pertain to digit 
formation and to the development of the floor plate. The floor 


plate divides the brain into hemispheres and is required for 
midline separation of other anatomical features, including sep- 
arating developing eye tissue into right and left eyes. Given the 
central role of Shh in development, it stands to reason that Shh 
mutations profoundly affect normal development and mor- 
phology. Here we briefly examine two abnormal conditions 


707 


708 


CHAPTER 20 Developmental Genetics 


that are caused by changes in Shh activity: holoprosencephaly/ 
cyclopia and polydactyly. 


HOLOPROSENCEPHALY/CYCLOPIA Holoprosencephaly 
(HPE) is a genetically heterogeneous abnormality, meaning 
that mutations in different genes can cause the disorder. One 
form of holoprosencephaly, HPE3, is caused by Shh mutations. 
HPE3 is a clinically variable disorder that produces many differ- 
ent morphological abnormalities in patients. The most subtle 
phenotypic defect is a slight loss of midline separation, result- 
ing ina single central incisor. More severe defects include char- 
acteristic brain abnormalities; abnormalities of the mid-face, 
such as the formation of a proboscis-like nose; or possibly, 
in the most extreme cases, cyclopia, the presence of a single 
large mass of eye tissue rather than two separate eyes. 
Numerous Shh mutations that cause HPE3 affect the cod- 
ing region of the gene and result in the production of a 
severely defective or nonfunctional protein product, leading 
to a failure to form the floor plate and thus to form brain 
hemispheres (Figure 20.22a). To date, there are no specific 


(a) Sonic hedgehog gene 


genotype-phenotype correlations that tie specific Shh muta- 
tions to more severe or less severe manifestations of HPE3 or 
cyclopia. Pedigrees exhibit variation in both penetrance and 
expressivity, most likely because other genes involved in brain 
and mid-face formation (i.e., the other genes that cause the 
HPE phenotype) influence the extent of morphological abnor- 
mality (Figure 20.22b). Although the HPE3 mutations in Shh 
are missense, nonsense, and frameshift loss-of-function alleles, 
familial cases of HPE3 are inherited in an autosomal dominant 
manner. This indicates that the Shh mutations are haploinsuf- 
ficient: The presence of a single copy of a wild-type allele is not 
sufficient for normal activity. Thus, as with most genetic disor- 
ders that have been characterized in humans, both penetrance 
and expressivity of abnormal phenotypes are modified signifi- 
cantly by genetic background. 

During the 1950s, an epidemic of cyclopia was reported 
among sheep in the Western United States (Figure 20.22c). 
The compound cyclopamine, found in the plant Veratrum 
californicum, was implicated as an environmental cause 
of the abnormalities. Evidence indicated that ingestion of 
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(c) Phenotypes associated with alterations in Shh activity 
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causes cyclopia. 


Shh expression in developing 
mouse embryo 


Prolonged Shh activity in limb bud 
causes extra digit development. 


Figure 20.22 Effects of alterations in Shh morphogen activity in the floor plate and the limb bud. 


V.californicum during gestation caused the production of 
lambs with cyclopia. In 2002, Philip Beachy and colleagues 
looked at the mechanism by which cyclopamine caused 
cyclopia and discovered that the compound binds directly to 
cells in the floor plate and blocks their response to Shh pro- 
tein. This study illustrates that the action of normal proteins 
can be inhibited under certain environmental circumstances 
to produce effects similar to those seen with gene mutation. 
When an environmental condition induces a phenotype 
similar to that caused by mutation, the environmental condi- 
tion is said to induce a phenocopy of the mutant phenotype. 


POLYDACTYLY If Shh expression is eliminated from 
the developing limb bud by loss-of-function mutations 


SUMMARY 


20.1 Development Is the Building of a 
Multicellular Organism 


Multicellularity has evolved independently multiple 

times. 

The development of a multicellular organism from a fertil- 
ized egg cell entails the formation of specialized cell types, 
driven by differential expression of genes. 

As animal development proceeds, cells become progressively 
restricted in their potential developmental fates, changing 
from totipotent to pluripotent to differentiated. 
Morphogens can provide positional information that is con- 
verted into differential gene expression. 

Signaling between neighboring cells can induce or inhibit 
developmental pathways. Genes controlling developmental 
processes often encode transcription factors or molecules 
involved in signaling between cells. 


20.2 Drosophila Development Is a Paradigm for 
Animal Development 


Genetic screens in Drosophila identified sets of successively 
acting genes directing pattern formation during embryonic 
development. 

The Drosophila embryo is successively subdivided into seg- 
ments, each with a unique identity, by the sequential action 
of batteries of transcription factors. 

Genes whose products are supplied to the egg by the mother 
and act to guide the development of the embryo are called 
maternal effect genes. The genotype of the mother, rather 
than that of the embryo, dictates the embryonic phenotype 
for the traits these genes determine. 


Gap genes are regulated by maternal effect genes and 
subdivide the Drosophila embryo into several broad 

regions. Pair-rule genes are regulated by both maternal 
effect and gap genes, and they subdivide the embryo into 
parasegments. 

Homeotic genes known as the Hox genes act in combination 
to specify the parasegments of Drosophila. Hox genes are 
largely conserved throughout the metazoan kingdom. 
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inactivating the Shh protein, limb patterning is perturbed 
and digits do not form. However, if Shh expression is altered 
by mutation in the cis-regulatory region of the gene, changes 
in the Shh protein concentration gradient can result in poly- 
dactyly, the presence of extra digits (see Figure 20.22c). The 
extra digits develop because Shh protein is present in high 
concentration in parts of the limb bud where it is not normal- 
ly found. Polydactyly in humans (discussed in Section 4.2) is 
an autosomal dominant disorder. Its inheritance is dominant 
because the ectopic expression resulting from the mutation 
is a gain of function. The enhancer element responsible for 
appropriate Shh expression in the developing limb buds was 
identified using a phylogenetic footprinting approach (see 
Figure 18.15). 


For activities, animations, and review quizzes, go to the Study Area. 


E Downstream targets of the Hox genes contribute to the mor- 
phogenesis of body segments. 

E Hox gene expression patterns are maintained by regulation 
at the level of chromatin, providing a cellular memory of 
gene expression propagated through mitoses. 


20.3 Cellular Interactions Specify Cell Fate 


| In C. elegans, an inductive signal from the anchor cell 
determines vulval cell fates, and lateral inhibition refines 
cell specification in the developing vulva. 
Programmed cell death, or apoptosis, is a normal aspect 
of development in animals. It is required for sculpting the 
body plan during embryogenesis and maintaining tissues 
post-embryonically. 


20.4 “Evolution Behaves Like a Tinkerer” 


1 Most animals possess the same types of genes; therefore, the 
differences between animals are largely due to differences in 
how genes are deployed during development. 

1 Genes can be co-opted to direct the development of 
new organs and tissues, often through changes in gene 
expression patterns. For example, the evolution of limbs and 
digits in tetrapods occurred through changes in Hox and 
Sonic hedgehog gene expression. 


20.5 Plants Represent an Independent 
Experiment in Multicellular Evolution 


Despite differences in cellular behavior between plants and 
animals, the genetic control of development in plants has 
many similarities to that of animals. 

Plants continue to add organs throughout their life span due 
to the action of meristems, which are groups of pluripotent 
stem cells. 


Combinatorial action of homeotic genes specifies the identity 
of floral organs in flowering plants; the homeotic genes in 
plants encode MADS box transcription factors, analogous to 
the transcription factors encoded by the homeobox in animals. 
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1. Explain why many developmental genes encode either b. Consider the binding sites for gap proteins and Bicoid 

transcription factors or signaling molecules. in the stripe 2 enhancer module. What sites are 
ied i ts 2, 3, and 4, and h 
2. Bird beaks develop from an embryonic group of cells ait renee eisai usar 
does this result in expression or no expression? 
called neural crest cells that are part of the neural tube : , 
, : , c. Explain what you expect to see happen to even-skipped 
that gives rise to the spinal column and related structures. ; EN i 54 
: : stripe 2 if it is expressed in a Kriippel mutant back- 
Amazingly, neural crest cells can be surgically transplanted ; 
ground. A hunchback mutant background? A giant mu- 

from one embryo to another, even between embryos of rae 

: , . tant background? A bicoid mutant background? 
different species. When quail neural crest cells were trans- 
planted into duck embryos, the beak of the host embryo 6. What is the difference between a parasegment and seg- 
developed into a shape similar to that found in quails, ment in Drosophila development? Why do developmental 
creating the “quck.” Duck cells were recruited in addition biologists think of parasegments as the subdivisions that 
to the quail cells to form part of the quck beak. Conversely, are produced during development of flies? 
when duck neural crest cells were transplanted into quail 7. Why do loss-of-function mutations in Hox genes usu- 
embryos, the beak of the embryo resembled that of a duck, ally result in embryo lethality, whereas gain-of-function 
creating a “duail,” and quail cells were recruited to form mutants can be viable? Why are flies homozygous for the 
part of the beak. What do these experiments tell you about recessive loss-of-function alleles Ultrabithorax’”""™ and 
the autonomy or non-autonomy of the transplanted and Ultrabithorax? other viable? 
h ll ing b l ? 

a 8. Compare and contrast the specification of segmental 

3. How is positional information provided along the identity in Drosophila with that of floral organ specifica- 
anterior—posterior axis in Drosophila? What are the func- tion in Arabidopsis. What is the same in this process, and 
tions of bicoid and nanos? what is different? 

4. Early development in Drosophila is atypical in that pattern 9. Actinomycin D is a drug that inhibits the activity of RNA 
formation takes place in a syncytial blastoderm, allowing polymerase II. In the presence of actinomycin D, early de- 
free diffusion of transcription factors between nuclei. In velopment in many vertebrate species, such as frogs, can 
many other animal species, the fertilized egg is divided proceed past the formation of a blastula, a hollow ball of cells 
by cellular cleavages into a larger and larger number of that forms after early cleavage divisions; but development 
smaller and smaller cells. ceases before gastrulation. What does this tell you about ma- 
a. What constraints does this impose on the mechanisms ternal versus zygotic gene activity in early frog development? 

of pattern formation? 10. Ablation of the anchor cell in wild-type C. elegans results 
b. How must the model that describes Drosophila devel- in a vulva-less phenotype. 

opment be modified for describing other animal species a. What phenotype is to be expected if the anchor cell is 

whose early development is not syncytial? ablated in a /et-23 loss-of-function mutant? 

5. Consider the even-skipped regulatory sequences in b. What about if the anchor cell is ablated in a /et-23 gain- 
Figure 20.9. of-function mutant? 

a. How are the sharp boundaries of expression of eve 11. In gain-of-function let-23 and let-60 C. elegans mutants, all 
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inhibition (p. 683) 
MADS box (p. 706) 
meristem (p. 703) 


morphogen (p. 683) 
organizer (p. 683) 


inductive signal (p. 698) 
lateral inhibition (p. 700) 


maternal effect gene (p. 687) 


pluripotent (p. 683) 

positional information (p. 683) 

realizator gene (p. 695) 

segment polarity gene (p. 687) 

syncytial blastoderm (p. 685) 

syncytium (p. 685) 

totipotency (p. 683) 

zone of polarizing activity (ZPA) 
(p. 701) 

zygotic gene (p. 687) 


stripe 2 formed? 


of the vulval precursor cells differentiate with 1° or 2° fates. 


12. 


Do you expect adjacent cells to differentiate with 1° fates or 
with 2° fates? Explain. 


In mammals, identical twins arise when an embryo derived 
from a single fertilized egg splits into two independent 
embryos, producing two genetically identical individuals. 


Application and Integration 


13. 


14. 


15. 


16. 


17. 


18. 


bicoid is a coordinate, maternal effect gene. 


a. A female Drosophila heterozygous for a loss-of-func- 
tion bicoid allele is mated to a male that is heterozygous 
for the same allele. What are the phenotypes of their 
progeny? 

b. A female that is homozygous for a loss-of-function 
bicoid allele is mated to a wild-type male. What are the 
phenotypes of their progeny? 

c. Ifloss of bicoid function in the egg leads to lethality 
during embryogenesis, how are females homozygous 
for bicoid produced? What is the phenotype of a male 
homozygous for bicoid loss-of-function alleles? 


Given that maternal Bicoid activates the expression of 
hunchback (see Figure 20.7), what would be the con- 
sequence of adding extra copies of the bicoid gene by 
transgenic means, thus creating a female fly with two 
(the wild-type condition), three, or four copies of the 
bicoid gene? How would hunchback expression be 
altered? What about the expression of other gap genes 
and pair-rule genes? 


What phenotypes do you expect in flies homozygous for 
loss-of-function mutations in the following genes: Kriippel, 
odd-skipped, hedgehog, Ultrabithorax? 


The pair-rule gene fushi tarazu is expressed in the seven 
even-numbered parasegments during Drosophila embryo- 
genesis. In contrast, the segment polarity gene engrailed 

is expressed in the anterior part of each of the 14 paraseg- 

ments. Since both genes are active at similar times and places 

during development, it is possible that the expression of one 
gene is required for the expression of the other. This can 

be tested by examining expression of the genes in a mutant 

background—for example, looking at fushi tarazu expression 

in an engrailed mutant background, and vice versa. 

a. Given the hierarchy of gene action during Drosophila 
embryogenesis, what might you predict to be the result 
of these experiments? 

b. Based on your prediction, can you predict the 
phenotype of the fushi tarazu and engrailed double 
mutant? 


In contrast to Drosophila, some insects (e.g., centipedes) 
have legs on almost every segment posterior to the head. 
Based on your knowledge of Drosophila, propose a genetic 
explanation for this phenotype, and describe the expected 
expression patterns of genes of the Antennapedia and 
bithorax complexes. 


The bristles that develop from the epidermis in Drosophila 
are evenly spaced, so that two bristles never occur imme- 
diately adjacent to each other. How might this pattern be 
established during development? 
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a. What limits might there be, from a developmental ge- 
netic viewpoint, as to when this can occur? 

b. The converse phenotype, fusion of two genetically dis- 
tinct embryos into a single individual, is also known. 
What are the genetic implications of such an event? 


For answers to selected even-numbered problems, see Appendix: Answers. 


19. 


20. 


21. 


22. 


You are traveling in the Netherlands and overhear a tulip 
breeder describe a puzzling event. Tulips normally have 
two outer whorls of brightly colored petal-like organs, a 
third whorl of stamens, and an inner (fourth) whorl of car- 
pels. However, the breeder found a recessive mutant in his 
field in which the outer two whorls were green and sepal- 
like, while the third and fourth whorls both contained car- 
pels. What can you speculate about the nature of the gene 
that was mutated? 


A powerful approach to identifying genes of a develop- 
mental pathway is to screen for mutations that suppress 
or enhance the phenotype of interest. This approach was 
undertaken to elucidate the genetic pathway controlling 
C. elegans vulval development. 

a. A lin-3 loss-of-function mutant with a vulva-less phe- 
notype was mutagenized. Based on your knowledge of 
the genetic pathway, what types of mutations will sup- 
press the vulva-less phenotype? 

b. In a complementary experiment, a gain-of-function 
let-23 mutant with a multi-vulva phenotype was also 
mutagenized. What types of mutations will suppress the 
multi-vulva phenotype? 


Zea mays (maize, or corn) was originally domesticated in 
central Mexico at least 7000 years ago from an endemic 
grass called teosinte. Teosinte is generally unbranched, 
has male and female flowers on the same branch, and has 
few kernels per “cob,” each encased in a hard, leaf-like or- 
gan called a glume. In contrast, maize is highly branched, 
with a male inflorescence (tassel) on its central branch 
and female inflorescences (cobs) on axillary branches. In 
addition, maize cobs have many rows of kernels and soft 
glumes. George Beadle crossed cultivated maize and wild 
teosinte, which resulted in fully fertile F; plants. When 
the F, plants were self-fertilized, about 1 plant in every 
1000 of the F, progeny resembled either a modern maize 
plant or a wild teosinte plant. What did Beadle conclude 
about whether the different architectures of maize and te- 
osinte were caused by changes with a small effect in many 
genes or changes with a large effect in just a few genes? 


The Hoxd9-13 genes are thought to specify digit identity 

(see Figure 20.18). 

a. What would be the consequence of ectopically express- 
ing Hoxd10 throughout the developing mouse limb 
bud? What about Hoxd11? What about both Hoxd10 
and Hoxd11? 

b. You wish to examine the effect of loss-of-function al- 
leles in developing limbs. How would you construct a 
mouse in which the function of Hoxd9-13 is retained 
during anterior—posterior embryonic patterning but is 
absent from developing limbs? 
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23. Three-spined stickleback fish live in lakes formed when 
the last ice age ended 10,000 to 15,000 years ago. In lakes 


where the sticklebacks are prey for larger fish, they develop 
35 bony plates along their body as armor. In contrast, stick- 


lebacks in lakes where there are no predators develop only 
a few or no bony plates. 


a. In crosses between fish of the two different morpholo- 
gies, the lack of bony armor segregates as a recessive 
trait that maps to the ectodermal dysplasin (Eda) gene. 
Comparisons between the Eda-coding regions of the 


armored and non-armored fish revealed no differences. 


How can you explain this result? 

b. Loss-of-function mutations in the coding region of 
the homologous gene in humans result in loss of hair, 
teeth, and sweat glands, as in the toothless men of Sind 
(India). What does this suggest about hair, teeth, and 
sweat glands in humans? 


24. In C. elegans there are two sexes: hermaphrodite and male. 
Sex is determined by the ratio of X chromosomes to hap- 
loid sets of autosomes (X/A). An X/A ratio of 1.0 produces 
a hermaphrodite (XX), and an X/A ratio of 0.5 results in 
a male (XO). In the 1970s, Jonathan Hodgkin and Sydney 
Brenner carried out genetic screens to identify mutations 
in three genes that result in either XX males (tra-1, tra-2) 
or XO hermaphrodites (her-1). Double-mutant strains 
were constructed to assess for epistatic interactions be- 
tween the genes (see table). Propose a genetic model of 
how the her and tra genes control sex determination. 


Genotype” XX Phenotype XO Phenotype 
Wild-type Hermaphrodite Male 
tra-1'°° Male Male 
‘tra-2' Male Male 
her-1'°¢ Hermaphrodite Hermaphrodite i 
tra-19°"/+ Hermaphrodite Hermaphrodite 
tra-"®°1 tra-2'°° Male Male 
| tra-1"°€ her-1"°° Male Male 
“tra-2!® her-1%° Male Male 


tra-2'‘tra-12™/+ Hermaphrodite Hermaphrodite 


a rec = recessive mutation; dom = dominant mutation. 


25. The flowering jungle plant Lacandonia schismatica, dis- 
covered in southern Mexico, has a unique floral structure. 
Petal-like organs are in the outer whorls surrounding a 
number of carpels, and stamens are in the center of the 
flower. Closely related species are dioecious; female plants 
bear flowers that resemble those of Lacandonia, but with- 


out the central stamens. What type of mutation could have 


resulted in the evolution of Lacandonia flowers? 


26. Homeotic genes are thought to regulate each other. 


a. What aspect of the phenotype of apetala2 agamous 
double mutants indicates that these two genes act 
antagonistically? 

b. Are similar interactions observed between Hox genes? 
Compare the action of the floral homeotic genes in spec- 
ifying floral organ identity and the Hox genes in specify- 
ing identity along the anterior—posterior axis of animals. 


27. Dipterans (two-winged insects) are thought to have 
evolved from a four-winged ancestor that had wings on 
both T2 and T3 thoracic segments, as in extant butterflies 
and dragonflies. Describe an evolutionary scenario for the 
evolution of dipterans from four-winged ancestors. What 
types of mutations could lead to a butterfly developing with 
only two wings? 


28. Basidiomycota is a monophyletic group of fungi that in- 
cludes most of the common mushrooms. You are inter- 
ested in the development of the body plan of mushrooms. 
How would you identify the genes required for patterning 
during mushroom development? 


29. In Drosophila, recessive mutations in the fruitless gene 
(fru) result in males courting other males; and recessive 
mutations in the Antennapedia gene (Ant ) lead to defects 
in the body plan, specifically in the thoracic region of the 
body, where mutants fail to develop legs. The two genes 
map 15 cM apart on chromosome 3. You have isolated 
a new dominant Ant“ mutant allele that you induced by 
treating your flies with X-rays. Your new mutant has legs 
developing instead of antennae on the head of the fly. You 
cross your newly induced dominant Ant“ mutant (a pure- 
breeding line) with a homozygous recessive fru mutant 
(which is homozygous wild type at the Ant* locus), as 
diagrammed below: 


Ant@frut — Ant* fru 
=> 
Ant4frut — Ant “fru 


Ant’ fru* 
1 Ant* fru 


a. What phenotypes, and in what proportions, do you ex- 
pect in the F, obtained by interbreeding F; animals? 

b. Your cross results in the following phenotypic 
proportions: 


Legs on head, normal courting behavior 75 
Normal head, abnormal courting behavior 25 
Legs on head, abnormal courting behavior 0 
Normal head, normal courting behavior 0 


Provide a genetic explanation for these results and 
describe a test for your hypothesis. 

c. Provide a molecular explanation for the reason your new 
Ant’ mutant is dominant and for its novel phenotype. 


Genetic Analysis 
of Quantitative Traits 


CHAPTER OUTLINE 


21.1 Quantitative Traits Display 
Continuous Phenotype 
Variation 

21.2 Quantitative Trait Analysis Is 
Statistical 

21.3 Heritability Measures the 
Genetic Component of 
Phenotypic Variation 

21.4 Quantitative Trait Loci Are 
the Genes That Contribute to 
Quantitative Traits 


A human histogram depicting the distribution of heights among faculty 
and students of the University of Connecticut. The women are in white 
shirts and the men are in blue shirts. 


Fina the connection between phenotypes and 
genotypes is simplest when the phenotypic variation in a 
trait is determined by variation in a single gene. The segrega- 
tion of alleles of a single gene determining whether peas are 
round or wrinkled, as in Mendel’s studies, is a classic example. 
Other genes are not involved, and there is no evidence of 
gene interaction (i.e., epistasis) or of interaction between 
the gene and specific environmental factors. Similarly, your 
blood type—either A, B, AB, or O—is determined exclusively 
by inherited variation in a single gene, and the environment in 
which you were raised had no effect on that outcome. 

In reality, however, such direct correlations between phe- 
notypes and genotypes are not common. Many traits display 
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variation resulting from epistatic gene interactions 
(see Section 4.3). In addition, numerous traits, Known 
as polygenic traits, result from the influence of 
multiple genes. The contributing genes generally, 
assort independently to produce a large number 
of genotypes and multiple phenotypes. The inheri- 
tance of polygenic traits is identified as polygenic 
inheritance. Further complicating the correla- 
tions between genotypes and phenotypes is the 
finding that the phenotypes of many traits whose 
inheritance is polygenic are influenced by environ- 
mental factors. Thus both genetic variation and 
environmental variation contribute to the pheno- 
typic variation of certain traits, which are therefore 
referred to as multifactorial traits. 

A key indication of the influence of multiple 
genes and of environmental factors on certain phe- 
notypes is the assessment of variation for those traits 
in quantitative rather than qualitative terms. “Round 
seeds” versus “wrinkled seeds” or “blood type A” 
versus “blood type B” are examples of qualitative 
phenotypic differences. Qualitative phenotypes fall 
into discrete categories that correspond to particu- 
lar genotypes and that are often distinctly different 
from one another. In contrast, quantitative pheno- 
typic variation usually takes the form of continuous 
variation along a phenotypic scale, and the traits are 
frequently described using units of measure. For ex- 
ample, one might use kilograms to measure quanti- 
tative variation in the weight of cattle or centimeters 
to measure quantitative variation in the length of 
ears of corn. Traits of this kind are called quantitative 
traits. This term also applies to traits that vary over 
a phenotypic range that is non-numeric. Thus, while 
many measured in values such as grams or centime- 
ters but are instead described using, non-numeric 
terms, as with a range of color phenotypes (e.g., from 
black through shades of gray to white). 

The genetic study and analysis of quantitative 
traits is the focus of the field of inquiry known as 
quantitative genetics. In this chapter, we explore 
how quantitative genetics examines the hereditary 
variation of polygenic and multifactorial traits. In the 
process, we address some of the ways geneticists 
attempt to disentangle the genetic and environmental 


influences on trait variation and describe genetic 
approaches to interpreting the relative effects of those 
factors on quantitative trait phenotypes. 


21.1 Quantitative Traits Display 
Continuous Phenotype Variation 


For most of the traits we discuss in earlier chapters, 
phenotypic variation is controlled by allelic variation at 
single genes. The phenotypes of these single-gene traits 
commonly display discontinuous variation, meaning dif- 
ferences that allow organisms to be assigned to discrete, 
sharply distinguishable phenotypic categories. The dis- 
continuous patterns of variation lead to the specification 
of consistent phenotype ratios, such as a 3:1 ratio among 
the F, progeny of self-fertilized F, organisms. Even when 
two genes take part in epistatic interactions that affect 
phenotypic expression, the phenotypes are discrete and 
occur in predictable ratios (see Section 4.3). 

In contrast, polygenic and multifactorial traits usually 
display continuous variation, which is phenotypic varia- 
tion distributed across a range of values in an uninterrupted 
continuum. This section explores the genetic factors con- 
tributing to traits displaying continuous variation. 


Genetic Potential 


Human adult height is an example of a multifactorial trait 
that varies continuously along a scale of measurement usu- 
ally marked off in centimeters or inches. This continuous 
variation is demonstrated in the chapter-opening photo, 
in which some 138 University of Connecticut students 
and faculty are arranged according to height. The height 
distribution of this sample, divided into 1-inch increments, 
ranges from 60 inches (5 feet) to 77 inches (6 feet 5 inches). 
The length of each line of individuals behind the height 
markers represents the frequency of each incremental cat- 
egory, and the sweatshirt and hat color identifies the wear- 
er’s sex (white for women and blue for men). Examining 
the overall distribution, you can see that it is actually com- 
posed of two different distributions, one for each sex, and 
you can also see that the distribution is uneven. 

Adult height is influenced by multiple genes. For ex- 
ample, a 2011 study by Matthew Lanktree and many col- 
leagues used the analysis of human genomic variation and 
statistical methods to suggest that more than 60 genes 
may influence adult height. While the actual number of 
genes influencing human height continues to be investi- 
gated, your own personal experiences, as well as popula- 
tion studies, most likely tell you that taller parents tend 
to have taller children and shorter parents tend to have 
shorter children. 
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In addition to this genetic influence, however, en- 
vironmental and developmental factors can have a sig- 
nificant effect. If your genetics class is typical of most, 
a survey of your classmates would likely find that many 
of the men are taller than their fathers and grandfathers 
and that many of the women are taller than their mothers 
and grandmothers. These differences are due almost ex- 
clusively to improved prenatal and childhood health and 
nutrition and only minimally to changes in the population 
genetic makeup influencing adult height. Longitudinal 
studies confirm that much of the world’s population is 
getting taller. During the 20th century, the height of the 
average American woman increased from approximately 
5’2” in 1900 to almost 5’5” in 2000. An even more dra- 
matic increase in average adult height can be observed by 
walking through the doors of houses and other structures 
built a few centuries ago. Most modern-day visitors have 
to stoop to enter! Such observations lead to the clear con- 
clusion that adult height is a multifactorial trait. 

To understand the role of genetics in a trait like adult 
height, you might think of parents as transmitting to their 
children a “genetic potential” for reaching a certain maxi- 
mum adult height; the genetic potential will be attained 
if the child grows and develops under ideal conditions. 
Not all of the children of a particular pair of parents will 
have the same genetic potential, since segregation and 
independent assortment of the contributing genes can 
produce many different genotypes. These processes pro- 
duce offspring with different genotypes conveying genetic 
potential for a range of heights, including heights that are 
greater or lesser than those of their parents. On average, 
however, progeny genetic potential for height will be at 
approximately the midpoint of the two parents’ genetic 
potential. The phenotypic outcome (actual adult height) 
is subject to various influences on the height potential 
conveyed by the genotype, including prenatal and mater- 
nal health and childhood health and nutrition, as the fol- 
lowing discussion illustrates. 


Major Genes and Additive Gene Effects 


The continuous phenotypic variation of polygenic traits 
results from the effects of multiple genes that may exert 
different amounts of influence. For example, the human 
OCA2 gene has several alleles that strongly influence eye 
color. The color of the adult eye is further influenced by 
other genes that act less strongly than OCA2. A gene like 
OCA2 is classified as a major gene, since it has a strong 
effect on the phenotype. Genes that have minor effects on 
the phenotype are classified as modifier genes. 

On the other hand, if the continuous phenotypic 
distribution results from incremental contributions by 
multiple genes, then the genes contributing to phenotypic 
variation in this way are known as additive genes. Each 
allele of additive genes can be assigned a quantitative 
value that indicates its contribution to a polygenic trait 


known as an additive trait. In the absence of environ- 
mental influence, phenotypes can be predicted by adding 
the values of the alleles together. For certain traits, each 
of the additive genes has an approximately equal effect, 
while for other traits the influence of each gene is distinct. 

Grasping the notion of additive genes requires a dif- 
ferent way of thinking about genotypes and phenotypes 
than we have discussed previously. Since traits controlled 
by additive genes have a phenotype that is the sum of 
allelic contributions across multiple genes, it is possible 
for more than one genotype to correspond to certain 
phenotypes. Segregation and independent assortment of 
additive alleles produces the various genotypes, but the 
phenotype corresponding to each is based on the sum of 
the values of the alleles at all the contributing loci. 

In the early 1900s, coinciding with the verification 
and expansion of the then recently rediscovered heredi- 
tary principles of Mendel, geneticists began to explore the 
hypothesis that the segregation of alleles of multiple genes 
played a role in phenotypic variation of particular traits. 
Known as the multiple-gene hypothesis, the proposal 
was that alleles at each of the contributing genes obeyed 
the principles of segregation and independent assortment 
and had an additive effect in the production of phenotypic 
variation. 

The multiple-gene hypothesis was the foundation of 
quantitative genetics, and the plant geneticist Hermann 
Nilsson-Ehle was one of the first to use the hypothesis in 
his 1909 description of genetic control of kernel color in 
wheat. Figure 21.1 illustrates one of Nilsson-Ehle’s genetic 
models, describing the determination of wheat kernel 
color by additive alleles of two genes. In this model, only 
genetic effects on phenotype are being considered. The 
model predicts that kernel color spans a spectrum from 
dark red to white. Gene A and gene B each have two al- 
leles. Alleles A; and B; are equivalent to one another, each 
adding an equal unit of color to the phenotype. Alleles 
A» and By are also equivalent, neither adding any units 
of color to the phenotype. Under the additive genetic 
model, the more “number 1” alleles, either A; or B;, the 
genotype contains, the darker the color of wheat kernels. 
Conversely, the fewer number 1 alleles (or the more 
“number 2” alleles) there are in the genotype, the lighter 
the kernel color. The deepest red color (dark red) is pres- 
ent when four number 1 alleles are present (A;A,B/B)). 
Conversely, white kernels are produced when no copies of 
number 1 alleles are in the genotype (A 2A 2B2B>). 

Figure 21.1 shows a cross between pure-breeding 
dark red and pure-breeding white plants. The cross pro- 
duces F; plants that are dihybrid (A;A2B By») and have 
dark pink kernel color as a consequence of carrying just 
two number 1 alleles. Crossing the F, plants produces 
an F, generation with five different kernel colors, each 
dependent on the total number of number 1 alleles in 
the genotype. For these two loci, genotypes can have a 
maximum of four number 1 alleles and a minimum of 
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Summary of: 
Genotypes Phenotypes 


A,B, AB, A,B, 


Light 
A o) A 


T6 AABB; 


AA,B2B, 
White 


e 


Polygenic inheritance of wheat kernel color 


Te AA,BBy 


Figure 21.1 
controlled by two additive genes. Each 1 allele (either A; or B;) 
adds a unit of color, but 2 alleles (A or B2) add no units of color. 
Pure-breeding parents (one dark red, one white) produce dihy- 
brid F, with dark pink kernel color. Five phenotype classes are 
predicted among F; progeny in a ratio determined by the total 
number of A; plus B; alleles in the genotype. 


zero number 1 alleles. The five different totals of number 
1 alleles produce the five different phenotypes in the Fz 
generation, in proportions determined by independent 
assortment. Among the Fp, 1/16 carry four number 1 al- 
leles and produce dark red kernels like the parental plant, 
4/16 carry three number 1 alleles and have light red ker- 
nels, 6/16 have two number 1 alleles and have dark pink 
kernels, 4/16 carry a single number 1 allele and have light 
pink kernels, and the final 1/16 have no number 1 alleles 
and have white kernels like the parental plant. 

As the number of additive genes contributing to a 
phenotypic trait increases, the number of phenotype 
categories increases as well. Figure 21.2 illustrates an 
additive genetic model in which wheat kernel color is 
determined by three genes. In this example, genes A, 
B, and C each have two alleles whose additive effect is 
computed in the same way as for the two-gene system 


of Figure 21.1: Phenotype categories are determined 
by the number of “1” alleles contained in a genotype. 
A cross of pure-breeding dark red and pure-breeding 
white parental plants produces an F, of an intermedi- 
ate (dark pink) color as a result of its trihybrid genotype 
(A ,A2B,B5C,C>). Independent assortment produces an 
F, that falls into seven phenotypic categories that are 
determined by genotypes that have a maximum of six 1 
alleles and a minimum of zero 1 alleles. 


Continuous Phenotypic Variation from 
Multiple Additive Genes 


The more phenotypes that occur along a limited scale 
of measurement, the narrower is the slice of the distribu- 
tion each category occupies and the less obvious the de- 
marcation between categories may become. Figure 21.3 
shows five histograms illustrating the distribution of F, 
phenotypes produced by different numbers of additive 
genes that each have two alleles. As in the preceding ex- 
amples, each number 1 allele adds a unit of color to the 
phenotype, but number 2 alleles do not. The proportions 
for each phenotype can be determined using probability, 
or one can use Pascal’s triangle to determine each ex- 
pected proportion (see Figure 2.15). Notice the increase 
in the number of phenotype classes as the number of 
genes contributing to the phenotype increases from 
one to five. Moreover, the adjacent phenotype classes 
resemble one another more closely as the number of 
classes increases, blending into a continuous phenotypic 
distribution. 

The number of distinct phenotype categories for a 
polygenic trait produced by the segregation of additive 
alleles of a given number of genes (n) is calculated as 
2n + 1. For example, for three additive genes contribut- 
ing to a polygenic trait, n = 3, and the number of distinct 
phenotypic categories is 2(3) + 1= 7. Table 21.1 lists the 
numbers of phenotypic categories for different numbers 
of contributing genes and gives the frequency of the most 
extreme phenotypes in each distribution. If more than 
two alleles occur for the contributing genes, the number 
of phenotypes can increase. 


Allele Segregation in Quantitative Trait 
Production 


In 1916, plant geneticist Edward East undertook a com- 
prehensive examination of the multiple-gene hypothesis 
by testing its ability to explain patterns of inherited 
variation that he produced in the length of the corolla 
(the petal-producing part of the flower) in Nicotiana lon- 
giflora. In this long-flower species of tobacco, the corolla 
is a tube-shaped structure whose length can be measured 
and compared with corollas in other plants. 

East began his experiments with pure-breeding 
parental lines, one having a short corolla approximately 
40 millimeters long and the other producing a long corolla 
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Figure 21.2 A three-gene additive model for wheat kernel color. Color is determined by total 
number of 1 alleles (A;, B;, and C;) in the genotype. The F, have seven phenotypic classes in proportions 
generated by independent assortment at three loci. 
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(a) One locus: A,A; x A,A> 
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(b) Two loci: A,A,B,B> x A,A2B,B3 
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(c) Three loci: A;A,B;B2C;C, x A;A2B,B2C,C; 
Number of color-producing alleles 
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(d) Four loci: A;A,B,;B,C,C,D,D; x A,A,B;B,C,C,D,D, 
Number of color-producing alleles 
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(e) Five loci: A;A,B,;B,C,C,D,DE,E, x A;A2B,B,C,;C,D,DE,E, 
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Figure 21.3 Phenotype distributions with additive 

genes. The parents producing progeny in each example are 
heterozygous for each gene. The color-contributing alleles 

are designated as 1 for each gene. The number of Fp phenotype 
categories increases with the number of additive genes. 


of approximately 90 millimeters (Figure 21.4). Note that 
there is a small amount of variation in corolla length in 
each pure-breeding line, suggesting that despite attempts 
to produce pure-breeding lines, gene—gene interaction 
or multifactorial effects produce some variability. The 
F, progeny of this cross had an average corolla length of 
about 65 millimeters, approximately midway between the 
parental averages. These “mid-parental” values are an in- 
dication of strong genetic control of corolla length. Once 
again, there is some variability around the average corolla 
length value, but none of the F, have corolla lengths that 
are near the parental values. 

East allowed F, plants to self-fertilize to produce 
about 450 F, among which he observed a wider distribu- 
tion of corolla length than in the F4, although the average 
length was about the same as that of the F}. None of the 
F, East produced had corolla lengths equal to those of the 
pure-breeding parental lines. Then, over three additional 
generations beginning with Fy, East selectively bred plants 
to produce a line having a short corolla and a line having 
a long corolla, achieving new collections of plants with 
corolla lengths approximating those found in the original 
pure-breeding parents. 

East reached two general conclusions based on his 
observations. Both conclusions are consistent with the 
models of continuous phenotypic variation of quantitative 
traits we have described. First, he concluded that corolla 
length in Nicotiana longiflora, particularly in the Fy, re- 
sults from the segregation of alleles of multiple genes. 
Second, East concluded that the phenotypic expression of 


Table 21.1 


The Effect of Polygenes on Phenotypic 
Variation 


Number of Number of Phenotype Frequency of Most 


Genes (n) Categories Extreme Phenotypes 

1 3 1/4 
a 5 A8 j 

3 7 1/64 
4 9 1/256 
5 11 1/1024 
6 13 1/4096 
7 15 1/16,384 
8 17 1/65,536 
9 19 1/262,144 

10 21 1/1,048,576 
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Figure 21.4 Corolla length in tobacco. Edward East deter- 
mined that alleles of multiple genes control genetic variance in 
corolla length of tobacco (Nicotiana longiflora). 


each genotype is influenced by nongenetic factors, that is, 
genes interacting with environmental factors to blur the 
direct correspondence between a given genotype and a 
specific phenotype. The nongenetic factors partially ex- 
plain the variation around average corolla length. Genetic 
Analysis 21.1 guides you through your own analysis of 
polygenic contributions to plant height. 


Effects of Environmental Factors 
on Phenotypic Variation 


Disentangling the genetic and nongenetic factors that 
determine phenotypic variation is a difficult but impor- 
tant task in genetics. In humans, for example, common 
diseases such as heart disease, cancer, and diabetes are 
influenced by heredity, but nonhereditary factors are also 
critically important in disease development. Identifying 
the particular genes and the specific nonhereditary factors 


contributing to these diseases is the ultimate goal of re- 
search, but it must be approached in small, incremental 
steps that include modeling of the interactions of heredi- 
tary and nonhereditary factors. 

Figure 21.5 shows a general approach taken by models 
of this kind. It displays the phenotypic ranges that would 
be associated with the genotypes AA}, AjAz, and A2A> 
under different assumptions of gene—environment inter- 
action. In Figure 21.5a, no gene—environment interaction 
takes place, and each genotype corresponds to a distinct 
phenotype. Predictable correspondence of genotype and 
phenotype is seen in the Fy, where phenotypic distribu- 
tion is discontinuous and a 1:2:1 phenotype ratio is found. 
Figure 21.5b shows the phenotypic ranges of parents and 
F, and F, progeny when moderate interaction occurs be- 
tween the genotype and environmental factors. In each 
generation, a range of phenotypic values is associated 
with each genotype, and in the F., there is a small degree 
of overlap between the phenotypic ranges of different 
genotypes. In Figure 21.5c, substantial interaction between 
genes and environment takes place. A wide range of phe- 
notypic values is associated with each genotype, and in the 
F, a significant degree of phenotypic overlap between the 
genotypes is seen, so that a large proportion of heterozy- 
gotes have phenotypes that overlap those of a homozygote. 
Gene-environment interaction of this kind is typical of 
multifactorial traits and can make it difficult to determine 
the genotype of an organism simply by looking at its phe- 
notype. In a later chapter section, we refer to the influence 
of environmental factors on genotype using the term envi- 
ronmental variance. In that section, we describe a quanti- 
tative approach to determining how much of the variance 
in phenotype is due to environmental factors. 

The use of a “phenotype scorecard” to predict the out- 
come of polygenic inheritance and of gene—environment in- 
teraction in determining the multifactorial trait of height in a 
hypothetical plant is illustrated in Experimental Insight 21.1. 


Threshold Traits 


Most polygenic and multifactorial traits exhibit a con- 
tinuous phenotypic distribution, but certain of these traits, 
while having an underlying continuous distribution, can 
nevertheless be divided into distinct categories. Such traits 
are often called threshold traits, and a number of them 
are identified by threshold traits are often encountered in 
medical contexts, where attempts are made, not always suc- 
cessfully, to identify two clinical categories—“unaffected” 
(or “normal”) and “affected” (or “abnormal”’)—and thus 
to distinguish individuals who have an abnormality from 
those that do not. For human threshold traits, the vast 
majority of the population will have phenotypes on the 
unaffected side of the threshold and will display the normal 
phenotype. A small proportion of the population, how- 
ever, are found on the other side of the threshold and have 
an affected or abnormal phenotype. Cases that lie at the 
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borderline between the two categories can be problematic 
to diagnose. 

The genetic hypothesis explaining threshold traits 
proposes that the trait is polygenic or multifactorial, 
so that underlying the affected and unaffected pheno- 
type categories is a continuous distribution of genetic 
liability—a term for the organism’s risk of having the 
affected phenotype as the result of inheriting a particular 
genotype. Each member of a population has a specific 
genotype, and different genotypes may confer a different 
genetic liability, making some individuals more likely to 
display an affected phenotype by crossing the threshold. 
Figure 21.6 shows a continuous distribution of genetic li- 
ability for a population and the designation of a threshold 
that separates unaffected from affected individuals in the 
population. The portion of the population lying to the left 
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Figure 21.6 Threshold traits. A theoretical continuous 
phenotypic distribution and a threshold of genetic liability for 
a threshold trait. 


of the threshold of genetic liability, by far the major- 
ity, are identified as unaffected or normal, and the small 
group to the right of the threshold are considered affected 
or abnormal. 

Models are used to test the applicability of these con- 
cepts to real-world observations at the population level. In 
these models, the likelihood of crossing the threshold of 
liability increases when more “liability alleles” are present 
in the genotype, that is, when the genotype confers greater 
genetic liability. For example, Figure 21.7a depicts a hypo- 
thetical three-gene model in which alleles are designated 
as either 1 or 2 at each locus and in which genetic liability 
increases with a greater number of 1 alleles. In this model, 
the threshold of liability is passed when at least five 1 al- 
leles are present. A greater number of 1 alleles in parental 
genotypes increases the proportion of progeny that will lie 
to the right of the threshold of liability and thus display an 
affected phenotype. The model can compare the risks of 
having a child affected by a threshold trait for parents car- 
rying different numbers of liability alleles. 

Figure 21.7a illustrates Cross 1 between a parent with 
two 1 alleles and a parent with three 1 alleles. Both parents 
have the unaffected (normal) phenotype, and each is on 
the unaffected side of the threshold. Among the progeny 
of this cross, 1/32 (3%) are expected to carry five 1 alleles, 
but none can carry six 1 alleles. Thus, 1/32 of the prog- 
eny lie to the right of the threshold of liability and have 
the affected phenotype. Figure 21.7b shows Cross 2 with 
different parents that produce a higher level of genetic li- 
ability in their progeny. In this cross, each parent carries 
three liability alleles, but neither is affected because the 
liability threshold is 5 or more liability alleles. Among their 
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Figure 21.7 A polygenic model for a threshold trait. Any 
allele designated as 1 confers genetic liability, any allele des- 
ignated as 2 confers no liability, and the 1 alleles are additive. 
(a) In Cross 1, the couple has a 1/32 chance of producing an 
affected child. (b) In Cross 2, the couple has a 7/64 chance of 
producing an affected child. 


progeny, however, independent assortment predicts that 
7/64 (11%) will have genotypes that contain five or more 
1 alleles. These progeny lie to the right of the threshold of 
liability and have the affected phenotype. The genotypes 
in the second cross confer almost a fourfold increased 
risk (3% versus 11%) of producing an affected offspring 
compared to the first cross. This difference is analogous to 
the difference we might see between different families in 
a population. Overall, a mating in the general population 
has a low risk of producing a child with a threshold trait. 
Different families may have different risks, however, and a 
mating of parents that both come from families with a his- 
tory of the trait will be most likely to produce children who 
also have the trait. 

The influence of environmental and developmental 
factors on phenotypes of threshold traits is an important 
additional component. These factors can play a role in 
determining whether individuals whose genetic liability 
places them near the threshold of liability end up hav- 
ing the trait. The threshold model envisions organisms 
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possessing high genetic liability (i.e., possessing a genome 
with many liability alleles) as having the potential to 
develop the affected phenotype. Whether the affected 
phenotype develops may be due to the influence of other 
hereditary, developmental, or environmental factors. Less 
often, an organism may have a genetic liability slightly be- 
low the threshold but the influence of environmental fac- 
tors could push the phenotype into the affected category. 

Certain threshold traits are more likely to occur in one 
sex than the other. Dislocated hips at birth is about four 
times more common in girls than in boys, for example. 
Thus, sex, and the developmental and hormone-based dif- 
ferences that distinguish the sexes, can influence whether 
a certain genotype produces an affected or an unaffected 
phenotype. This has important clinical implications. If a 
couple has previously had a child with dislocated hips, a 
physician will want to carefully examine all future chil- 
dren, especially if they are female, for hip dislocation. 

Lastly, there is a caveat to consider with regard to de- 
fining the categories and classification of threshold traits, 
particularly in humans. Because these traits are quantita- 
tive and fall along a continuum, precise determination of 
categories and phenotypes can be inexact. For example, 
it is easy to classify a person’s blood pressure as normal if 
it lies well within the normal range or as abnormal if the 
blood pressure is very high. Many people, however, have 
“borderline” high pressures that are difficult to assign to 
either the normal or high blood pressure category. 


21.2 Quantitative Trait Analysis 
Is Statistical 


The statistical methods most often applied today to the 
study of quantitative traits are a direct extension of contri- 
butions made nearly a century ago by statistician and evo- 
lutionary biologist Sir Ronald Fisher. In 1918, Fisher used 
statistical analysis to show that quantitative traits result 
from the segregation of alleles of multiple genes displaying 
an additive effect. Fisher also showed that interactions 
between genes can be detected by these methods. In ad- 
dition, he explored the role of gene—environment interac- 
tion and concluded that environmental factors contribute 
to continuous variation by blurring the lines between 
phenotypic classes. The tools and approaches described 
here and pioneered by Fisher allow scientists to identify 
genetic influences on phenotypes in terms of quantitative 
measurement rather than qualitative appearance. In the 
following description and illustrations of quantitative trait 
analysis, we explore some concepts in statistics described 
in connection with chi-square analysis (see Section 2.5). 


Statistical Description of Phenotypic Variation 


The first step in quantifying the phenotypic variation 
of a trait in a population is to construct a frequency 
distribution of values of the trait on a quantitative scale. 


GENETIC ANALYSIS 


PROBLEM Dr. Ara B. Dopsis, a famous plant geneticist, develops several pure-breeding lines 

of daffodils. Under ideal growth conditions, line A plants are the tallest and grow to a height of BREAK IT DOWN: Three additive genes 

48 centimeters, whereas line B plants are the shortest and grow to 12 centimeters. Dr. Dopsis me ee ey 
ce z p g os p equal contributions to continuous variation in 

devises a genetic model with three additive genes that contribute equally to explain poly- plant height (p. 715). 

genic inheritance of plant height. He assumes that line A has the genotype A,A,B,B,C,C, and 


that line B has the genotype A2A2B2B2C2C3. In answering the following questions, assume that - 
ee BREAK IT DOWN: Pure-breeding plants 
genotype alone determines plant height under ideal growth conditions. «ES line B are homozygous for 1 and 2 


a. If these two pure-breeding parental plants are crossed, what will be the genotype and alleles, respectively. Seven progeny categories 
height of the F progeny plants? will produce continuous variation in height 
1 7 


(p. 717 and Figure 21.3). 
b. If Fz are produced, what is the expected frequency of plants with different heights? 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic this problem addresses 1. This problem concerns assessment of a three-gene additive model for plant 
and the nature of the requested answer. height, application of the model to crosses of pure-breeding parental plants 
of different heights, and evaluation of the F4 and F, progeny. 
2. Identify the critical information given in 2. The genotypes of the pure-breeding parents are given. In applying the 
the problem. polygenic additive model, we are to assume that genotype alone determines 
variation in plant height. 
Deduce 
3. Deduce the contribution of each allele of 3. The 48-cm height of line A plants is determined by six alleles of additive 
the additive genes to height in line A. ING genes. Each “1” allele in the line A genotype contributes 48 cm/6 = 8 cm to 


TIP: Assume that each allele makes an equal plant heig ht. 
contribution in this additive genetic model. 


4. Deduce the contribution of each allele of 4. Six alleles also contribute equally to the 12-cm height of line B plants. Each 


the additive genes to height in line B. “2” allele in the line B genotype contributes 12 cm/6 = 2 cm to plant height. 
6) Deduce the gametes produced by each 5. Line 1 has the genotype A,;A,B,B,C,C,; and produces gametes with the 
\\bure breeding line. genotype A,B,C;. Line 2 has the genotype AA BB CC, and produces the 
TIP: The laws of segregation and gamete genotype A2B2Cp. 
independent assortment apply to 
genes controlling polygenic traits. 
Solve Answer a 
6. Determine the genotype and height of F, 6. F, progeny of these pure-breeding parental plants will have the genotype 
plants. A,A2B,B2C;C>. Based on the contribution of each 1 and 2 allele, the predicted 
F; plant height is [(3)(8 cm)] + [(3)(2 cm)] = 30 cm. 
Answer b 
@) Determine the frequency and height of 7. The expected F, progeny are 
each category of F, plants. 
ae Number of Alleles Frequency Height (cm) 
TIP: Either use Pascal's triangle -§—————<$<<<—|{§$—_— 
(Figure 2.15) or determine 1 2 
the probability of genotypes PITFALL: Remember that for most 
containing different numbers of categories there are multiple geno- 0 6 1/64 12 
1 and 2 alleles. types with the same total number of 1 
and 2 alleles. 1 5 6/64 18 
2 4 15/64 24 
3 3 20/64 30 
4 2 15/64 36 
5 1 6/64 42 
6 0 1/64 48 


For more practice, see Problems 8, 9, and 20. Visit the Study Area to access study tools. MasteringGenetics™ 
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Phenotype Scorecard: A Multifactorial Quantitative Phenotype Simulation 


Here’s a hands-on activity that illustrates an approach to 
modeling a multifactorial quantitative trait. In this hypotheti- 
cal example, the mature height of a plant is under the control 
of five additive genes designated A to E. Two alleles at each 
gene make different contributions to height. Each allele with 
the subscript 1 adds 5 centimeters to the genetic potential, 
and each allele with the subscript 2 adds 10 centimeters. 
Therefore, a plant homozygous for 1 alleles at each locus 
(A;A;B,;B,C,C,D,D,E,E;) has genetic potential for a height of 
[(10 alleles)(5 cm/allele)] = 50 cm, as compared to a plant car- 
rying a genotype composed entirely of 2 alleles, which has a 
height potential of [(10 alleles)(10 cm/allele)] = 100 cm. Plants 
carrying genotypes with different numbers of 1 and 2 alleles 
have different genetic potentials for heights distributed at 
5-cm intervals along a continuum between 50 and 100 cm. 

At this point, let’s ask the following question: “How many 
1 and 2 alleles must be present to give a height potential 
of 80 cm?” Each genotype contains a total of 10 alleles, two 
at each of the five loci. Therefore, any genotype with six 
2 alleles and four 1 alleles will produce a height potential of 
[(6)(10) + (4)(5)] = 80 cm. 

Here’s a follow-up question: “What proportion of the prog- 
eny of two plants, each with a height potential of 75 cm, will 
have a height potential of 80 cm?” This problem is more com- 
plex. Plants with a height potential of 75 cm have five 2 alleles 
and five 1 alleles [(5)(10) + (5)(5) = 75]. Progeny genotypes 
that contain six 2 alleles and four 1 alleles will have a height 
potential of 80 cm. We can use the histogram in Figure 21.3e 
to predict the answer: 210 of the 1024 progeny (20.5%) have 
six copies of 2 alleles and four copies of 1 alleles. 

Having examined the relationship between genotype and 
potential height in this model, let's examine the effect of five 
environmental factors on the attainment of height: 


1. Amount of water 

. Amount of sunlight 

. Soil drainage 

. Nutrient content of soil 
. Temperature 


ub WN 


A frequency distribution shows what proportion of the 
population exhibits each measured value of the trait or 
falls into each category defined for the trait. Figure 21.8a 
provides an example, showing the number and frequency 
of each designated height category in a sample of 1000 
college-aged males. 

The individuals in this study are considered a 
random sample. They have not been selected for any 
attribute related to their height, and so their height 
distribution is assumed to resemble that of the general 
population of college-aged males. Random samples are 
used in quantitative trait analysis for two reasons. First, it 
is often impossible or impractical to collect data on every 
individual in a population; and second, random samples 


Each environmental factor can vary from optimal to poor. 
If all factors are optimal, we'll assume that full potential height 
is attained. However, if one or more of the environmental 
factors is less than optimal, then height is reduced. The state 
of each environmental factor has an effect on growth. In this 
exercise, we'll assume that the growth is affected according to 
the following scale: 


Environmental Factor State Height Lost 
Optimal (O) 0 cm lost 
Good (G) 4 cm lost 
Fair (F) 8 cm lost 
Marginal (M) 12 cm lost 
Poor (P) 16 cm lost 


If, for example, one environmental factor is optimal, two 
are good, one is fair, and one is marginal, the loss of potential 
height is 28 cm. 

The following table illustrates how the same genotype 
can produce different phenotypes under differing environ- 
mental conditions and how different genotypes can produce 
similar phenotypes under different conditions. Notice that 
the first two genotypes are identical but result in different 
phenotypes because of environmental differences. Also note 
that the third genotype has lower height potential than the 
other genotype but, in combination with a superior envi- 
ronment, results in the tallest plant. You can try your own 
combinations of genotypes and growth conditions to see 
different results. 


Height Environmental Height 
Genotype Potential Factor States Attained 
12345 
A1A2B,B2C>C>D,D2E,;E, 80cm GF OGM 52cm 
A1A2B,B2C>C>D,D2E,;E, 80cm F MG GF 44cm 
A,A,B,B2C;C>D,D2E;E2 70cm O G G G G 54cm 


can be just as accurate in the statistical sense as “samples” 
consisting of whole populations. As an analogy, about 
10 milliliters of blood—approximately two-tenths of 1% 
of a person’s total blood volume—is usually drawn for 
most routine blood tests. The amount taken is not large 
enough to cause physiological problems, but it is rep- 
resentative enough to provide dependable information 
concerning a person’s health status. 

After the frequency distribution is constructed, the 
first piece of information obtained from it is the average, 
or mean, value (x) for the distribution. Recall that this is 
calculated by summing all the values in the sample and 
dividing by the total number of individuals in the sample 
(n; see Section 2.5). Using the actual height of each of the 
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(a) Number and frequency of heights in 3-cm intervals 


Height (cm) Number Frequency (%) 
155-157 4 0.4 
158-160 8 0.8 
161-163 26 2.6 
164-166 53 53 
167-169 89 8.9 
170-172 146 14.6 
173-175 188 18.8 
176-178 181 18.1 
179-181 125 12.5 
182-184 92 9.2 
185-187 60 6.0 
188-190 22 22 
191-193 4 0.4 
194-196 1 0.1 
197-199 1 0.1 

1000 100 


(b) Number of females and males of each height 


Female Male 

Height (in) Number Height (in) Number 

60 5 64 2 

61 5 65 5 

62 7 66 2 

63 7 67 6 

64 9 68 7 

65 9 69 J- 

66 12 70 9 

67 6 71 6 

68 3 72 10 

69 2 73 7 

70 1 74 2 

71 1 75 3 

72 1 76 1 

77 3 

Total 68 70 
Average 64.5 inches 70.2 inches 
Standard +/- 2.7 inches +/- 3.2 inches 
deviation 
Variance  +/- 7.29 inches +/- 10.24 inches 


Figure 21.8 Adult height. The frequency distribution of 
height in 1000 college-aged males is shown in tabular form (a). 
Height data for 138 male and female college students (b). 


1000 men in his sample, Castle calculated a mean height 
value of 175.33 cm (about 68.5 inches). In contrast, the 
height averages for the 138 University of Connecticut 
students shown in the chapter-opening photo and sum- 
marized in Figure 21.8b are 64.5 inches for the women 
and 70.2 inches for the men. Both of these values are very 
close to the current U.S. population averages. 

The shapes of frequency distributions vary depending 
on several factors, including the sample size and the num- 
ber of classification categories for the trait. It is therefore 
necessary to provide a statistical description of the shape 
of the frequency distribution when comparing trait values. 
For example, it is important to report the mode, or modal 
value, that is, the most common value in a distribution. 
For the height data shown in Figure 21.8, the mode is the 
173-175 cm category, containing 188 individual values. Each 
distribution also possesses a middle value, known as the 
median, or median value. In the height distribution, you 


can think of the median value as entry number 500 (in order 
of increasing height) of the 1000 entries in the distribution. 
This median value also resides in the 173—175 cm category. 

Data in the real world are usually skewed—that 
is, unevenly distributed on either side of the mean, as 
Figure 21.8 and the chapter-opening photo both illus- 
trate. Therefore, to describe the frequency distribution, 
we must also have ways of measuring (and thus describ- 
ing) the nature of the distribution around the mean. Two 
forms of measurement are commonly used. 

The first, called the variance (s”), is a numerical mea- 
sure of the spread of the distribution around the mean. 
This measure interprets how much variation exists among 
individuals in the sample. The variance value depends on 
the relationship between the width of the distribution and 
the number of observations in the sample. It will be small 
if all the observations are close to the mean, and it will 
be large if the observations are widely spread around the 
mean (Figure 21.9). The variance is determined by sum- 
ming the square of the difference between each individual 
value and the sample mean and dividing that sum by the 
number of degrees of freedom (df) in the sample. The 


x 


Large variance with 
relatively few organisms 
in each category 


Intermediate variance with 
larger numbers of 
organisms in each category 


Small variance with 
larger numbers of 
organisms in a small 
number of categories 


Number of organisms in each phenotypic category 


Phenotypic distribution 


Figure 21.9 Normal distributions. The shape of curves 
depicting normal distributions is changed by the sample size 
and the number of outcome classes. Variance around the 
average is correspondingly large, intermediate, and small. 


number of degrees of freedom is equal to the number of 
independent variables. Squaring the differences between 
individual values and the sample mean prevents positive 
and negative differences from canceling each other out. 
This is why the variance is expressed as squared units: 


s? = >= (x; = x )?/df£ 


In our example of variation in a quantitative phenotype, 
the variance is described as phenotypic variance (Vp). 
Because we are measuring height in centimeters, the vari- 
ance will be expressed in centimeters squared. 

The second measure that describes the distribution 
of data is the standard deviation (s), a value expressing 
deviation from the mean in the same units as the scale of 
measurement for the sample. The standard deviation (s) is 
calculated ass = V32. In our sample of the heights of 1000 
college-aged males, Vp = s? = 43.30 cm’, and the standard 
deviation is s = 6.58 cm. In the sample of 138 college stu- 
dents, the standard deviations and variances for height of 
the 68 females and 70 males are as reported in Figure 21.8c. 


Partitioning Phenotypic Variance 


A key part of analyzing quantitative trait variation is to an- 
alyze the factors thought to contribute to phenotypic vari- 
ance, Vp. Quantitative phenotypes are the joint product of 
genes, environment, and gene interactions; consequently, 
phenotypic variance can be partitioned among those in- 
fluences. As a first step, the phenotypic variance can be 
divided into two principal components: genetic variance 
(Vg) and environmental variance (Vg). Under this assump- 
tion, phenotypic variance can be expressed in terms of ge- 
netic variance plus environmental variance: Vp = Vg + Ve. 

In this expression, genetic variance (Vg) is the pro- 
portion of phenotypic variance that is due to differences 
among genotypes. In highly inbred populations in which 
all individuals are homozygous for alleles controlling a 
quantitative phenotype, Vg = 0. Such populations are 
found only after strictly controlled laboratory inbreed- 
ing, however; they are rarely found in nature, due to the 
ubiquitous presence of genetic variation in natural popu- 
lations. Genetic variation in natural populations generates 
individuals with different genotypes for quantitative traits 
and leads to phenotypic variability that is directly attrib- 
utable to the genetic variability. 

Environmental variance (Vg) is the portion of phe- 
notypic variance that is due to variability of the environ- 
ments inhabited by individual members of a population. 
Differences in sun exposure, in water and nutrient content 
of the soil, and in exposure to pests are examples of envi- 
ronmental variables that influence Vg in plants. Carefully 
controlled laboratory experiments can sometimes control 
all of the environmental variables and produce a situa- 
tion in which Vg approximates zero. In nature, however, 
such circumstances rarely occur. Individual members of 
natural populations are almost certain to experience vari- 
ability in the environmental conditions they encounter. 
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Some differences may be systematic and predictable. For 
example, members of a plant population growing below a 
natural spring will experience wetter growth conditions 
than plants living above the spring. Other environmental 
variables are sporadic or unpredictable. For example, a 
dry year might reduce the flow of water from a natural 
spring and affect the plants living below the spring more 
severely than those living above it. 

Let’s use an example to illustrate the dissection of Vg 
and Vg as components of Vp. Suppose that two different 
pure-breeding parental lines are established. Each line 
is genetically uniform, with Vg = 0; therefore, Vp = Vg 
(Figure 21.10a). The pure-breeding lines are crossed to 
produce F; progeny that are genetically uniform. In the 
Fi, Vg=0 because there is no genetic variation among the 
individuals, and Vp = Vg (Figure 21.10b). Production of Fy 
leads to genotypic variation and thus to the production 
of phenotypic variation that results from a combination 
of genetic variance and environmental variance (Figure 
21.10c). Among the Fy, Vp = Vg + Vg. Since Vg has been 
determined among the parents and in the F4, genetic vari- 
ance can be calculated by subtracting environmental vari- 
ance from the phenotypic variance among the F>. In other 
words, Vg = Vp — Vg. Genetic Analysis 21.2 provides prac- 
tice in determining environmental and genetic variance. 


(a) Both parental lines 
are genetically 
uniform, so Vp = Vs. 


Vg=0 
Vp= Ve 


(b) 


The F, are genetically 
uniform, so Vp = Vs. 


(c) 


Vp = Ve+ Ve, or 


The F, pheno- 
typic variance 
results from 
genetic and 
environmental 


Vs= Vp- Ve 


variance. 


Figure 21.10 Sources of phenotypic variance. 


GENETIC ANALYSIS 


PROBLEM Two pure-breeding lines of tomatoes, P; and P3, producing fruit with 
different average weights, are crossed. The means and variances of their F; and F2 Line 


progeny are shown in the table to the right. 
a. What is the environmental variance (Ve) for this trait? 
b. What is the genetic variance (Vg) determined from the F3? 


variance plus environmental variance. The three values can be 


BREAK IT DOWN: Phenotypic variance equals genetic 
manipulated to isolate and quantify one value at a time (p. 725). 


Average 
Fruit Weight (g) Vp 
P} 6.5 1.6 g? 
P> 14.2 3.5 g? 
Fy 10.2 2.2 g? 
F2 9.8 4.0 g? 


Solution Strategies Solution Steps 


Evaluate 

1. Identify the topic this problem addresses 1. This problem concerns the determination of environmental variance and 
and the nature of the requested answer. genetic variance for the tomato plant data given. 

2. Identify the critical information given in 2. Fruit weight and phenotypic variance are given for the two pure-breeding 
the problem. parental lines and for the F4 and F, progeny. 

Deduce 

3. Describe the relationship between Vp, Vg, 3. Vp=Vs + Ve 
and Ve. 


Identify the variance values that contribute 4. Each of the pure-breeding parental lines (P4 and Pz) and the F, progeny are 


to Vp in each line and generation. 


TIP: For organisms 


that are genetically 
identical, Vp = Vg. 


genetically uniform. As a consequence, all phenotypic variance is due to 
environmental variance, and genetic variance makes no contribution. The F3 
contains genotypic variety, so both Vg and Ve contribute to Vp. 


Solve 
5. Determine Vç for this trait. 


Answer a 


5. Inthe genetically uniform P4, P2, and F4, Vg=0, and in each line Vp = Ve. 


The average environmental variance among these three lines is calculated 
as (1.6 + 3.5 + 2.2)/3 = 2.43 grams. 


Answer b 
6. Determine Vg for this trait. 


6. Vis calculated by rearranging the expression in step 3 to Vg = Vp — Ve. The 


genetic variance for these data is Vg = 4.0 — 2.43 = 1.57 grams. 


For more practice, see Problems 4, 10, and 12. 


Partitioning Genetic Variance 


Each allelic difference affecting a quantitative trait con- 
tributes to genetic variance in a population, but not 
necessarily each in the same way. Indeed, it can be dif- 
ficult to measure the specific effect of each allelic variant. 
Nevertheless, genetic variance can theoretically be parti- 
tioned into three different kinds of allelic effects. Additive 
variance (Va) derives from the additive effects of all 
alleles contributing to a trait. Additive variance is the re- 
sult of incomplete dominance of alleles at a locus, which 
causes heterozygotes to have a phenotype intermediate 
between the homozygous phenotypes. Dominance vari- 
ance (Vp) is variance resulting from dominance relation- 
ships in which alleles of a heterozygote produce a phe- 
notype that is not exactly intermediate between those of 
homozygotes (i.e., the nonadditive effects of alleles of con- 
tributing genes). Lastly, interactive variance (Vj) derives 
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from epistatic interactions between the alleles of different 
genes that influence a quantitative phenotype. Collectively 
these three components unite to produce the genetic 
variance in a model summarized by Vg = Va + Vp + Vj. 
We use these values in the following section to discuss 
heritability. 


21.3 Heritability Measures the Genetic 
Component of Phenotypic Variation 


One goal of quantitative genetics is to estimate the ex- 
tent to which genetic variation influences the phenotypic 
variation seen in a trait. This is a challenging task when a 
trait is determined by a combination of genetic variation, 
environmental variation, and gene—environment interac- 
tion. The concept of trait heritability was developed to 
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help measure the proportion of phenotypic variation that 
is due to genetic variation. 

Heritability differs from trait to trait. The pheno- 
typic variation observed in a trait with high heritability 
is largely the result of genetic variation and thus can be 
strongly influenced by selection programs focused on 
changing the frequency of a phenotype in a population. 
Conversely, only a small proportion of the phenotypic 
variation of a trait with low heritability can be attributed 
to genetic variation, so the expression of the trait in a pop- 
ulation is not effectively changed by selection processes. 
Heritability is an important measure of the potential 
responsiveness of a trait to natural selection or artificial 
selection. It is of special interest to evolutionary biologists 
and plant and animal breeders, who use it to assess the 
potential impact of selection on traits of agricultural or 
economic importance. 

Two widely used measures of heritability assess differ- 
ent components of the contribution of genetic variation to 
phenotypic variation. Broad sense heritability (H?) esti- 
mates the proportion of phenotypic variation that is due to 
total genetic variation. This form of heritability is defined 
by the equality H? = Vg/Vp. Narrow sense heritability 
(h?) estimates the proportion of phenotypic variation that 
is due to additive genetic variation. Narrow sense heritabil- 
ity is defined by the equality h? = Va/Vp. Both measures 
of heritability are expressed as proportions that range in 
magnitude from 0.0 to 1.0. In all cases, greater heritability 
values indicate a larger role for genetic variation in pheno- 
typic variation. 

Heritability is easily misunderstood. An erroneous 
understanding can lead to the mistaken idea that genetic 
variation makes a much larger contribution to phenotypic 
variation than the data actually support. Heritability is 
difficult to apply to humans except under limited circum- 
stances (described later in the discussion of twin studies), 
but it can be used for other organisms. The following 
attributes of heritability are central to its meaning: 


1. Heritability is a measure of the degree to which genetic 
differences contribute to phenotypic variation of a trait. 
In other words, heritability is high when much of the 
phenotypic variation is produced by genetic variation 
and little is contributed by environmental variation. 
Heritability is not an indication of the mechanism by 
which genes control a trait, nor is it a measure of how 
much of a trait is produced by gene action. 


2. Heritability values are accurate only for the environ- 
ment and population in which they are measured. 
Heritability values measured in one population can- 
not be transferred to another population, because 
both genetic and environmental factors may differ 
between populations. 

3. Heritability for a given trait in a population can 
change if environmental factors change, and changes 
in the proportions of genotypes in a population can 


alter the effect of environmental factors on pheno- 
typic variation, thus changing heritability. 


4, High heritability does not mean that a trait is not 
influenced by environmental factors. Traits with high 
heritability can be very responsive to environmental 
changes. 


Broad Sense Heritability 


We have seen that genetic variance (Vg) is a composite 
value that derives its magnitude from additive, dominance, 
and interaction variance. Unfortunately, genetic variance 
is not always easy to partition into these separate compo- 
nents. Fortunately, broad sense heritability (H? = Vg/Vp) 
can be used as a general measure of the magnitude of 
genetic influence over phenotypic variation of a trait, when 
Vg cannot be partitioned. 

In a 1988 study of the genetics and evolution of 
cave fish (Astyanax fasciatus), Horst Wilkens used broad 
sense heritability analysis to describe the genetic con- 
tribution to the evolution of the organism’s eye tissue. 
Some populations of this species live in completely dark 
underground cave streams in Eastern Mexico and have a 
dramatically reduced amount of eye tissue in comparison 
to closely related fish living aboveground. In these popu- 
lations, the eye tissue appears to be undergoing rapid evo- 
lutionary change. The eyes in sighted fish of this species 
are approximately 7 cm in diameter. In comparison, blind 
cave fish have less than 2 cm of eye tissue diameter. 

Wilkens crossed sighted cave fish to blind cave fish, 
measured eye tissue mean and variance in the F,, and 
then produced F, fish and measured their eye tissue as 
well. Since the F; fish are nearly genetically uniform, the 
variance in the amount of eye tissue is due entirely to the 
environment. In these F4, Vg was 0.057 cm?. Among the Fy, 
phenotypic variance (Vp) was 0.563 cm? and was the result 
of both genetic and environmental variance (Vg + Vp). 
Broad sense heritability is derived by determining Vg and 
dividing it by phenotypic variation. In this case, 


Va = Vp — W% = 0.563 — 0.057 = 0.506 
H? = V/V = 0.506/0.563 = 0.899 


This broad sense heritability of approximately 0.90 means 
that approximately 90% of the phenotypic variation in eye 
size between these populations of cave fish is due to genetic 
variation. 


Twin Studies 


Heritability can be quantified when both mating and 
environmental factors can be controlled. However, when 
mating and environmental variation are not among the 
controlled experimental parameters, heritability is far 
more difficult—some would say impossible—to measure 
accurately. This limitation applies to attempts to measure 
the heritability of traits in humans. Fortunately, studies 
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of phenotypic variation in human twins can offer insights 
into broad sense heritability of human traits. 

Identical twins, also known as monozygotic twins 
(MZ twins), are produced by a single fertilization event 
that is followed by a splitting of the fertilized embryo into 
two zygotes. Monozygotic twins share all of their alleles. 
Theoretically, broad sense heritability can be determined 
by assuming that phenotypic variance between them is 
fully attributable to environmental variance. Under this 
assumption, in MZ twin pairs, Vp = Ve. 

Fraternal twins, on the other hand, are dizygotic 
(DZ twins), produced by two independent fertilization 
events that take place at the same time. Dizygotic twins 
are siblings that are born at the same time, but they are 
no more closely related than siblings born at different 
times. Like all full siblings, DZ twins have an average of 
50% of their alleles in common. To control for differences 
between the sexes, only DZ twins of the same sex are used 
in twin studies. Phenotypic variance between DZ twins is 
the sum of environmental variance plus one-half of the 
genetic variance (the 50% of alleles not shared by the aver- 
age DZ twin pair): In DZ twin pairs, Vp = Ve + 1/2Vg. On 
the basis of these general formulas for calculation of H’, 
broad sense heritability can be estimated for human traits 
by methods we do not discuss here (Table 21.2). 

Studies of traits in human twins usually compare MZ 
twins to same-sex DZ twins to make heritability estimates 


Table 21.2 


Some Broad Sense Heritability (H2) Values 


from Human Twin Studies 


Trait Heritability (H?), % 
Biological Traits 

Total fingerprint ridge count 90 
Height 85 
Maximum heart rate 85 
Club foot 80 
Amino acid excretion 70 
Weight 60 
Total serum cholesterol 60 
Blood pressure 60 
Body mass index (BMI) 50 
Longevity 29 
Behavioral Traits 

Verbal ability 65 
Sociability index 65 
Temperament index 60 
Spelling aptitude 50 
Memory 50 
Mathematical aptitude 30 


more accurate. Even so, heritability studies of human 
twins are prone to several sources of error that lead to 
inaccurately high values. Following are the most common 
sources of error: 


1. Stronger shared maternal effects in identical twins 
than in fraternal twins. These effects include the 
sharing of embryonic membranes and other aspects 
of the uterine environment that lead to more similar 
developmental conditions for identical twins than for 
fraternal twins. 


2. Greater similarity of treatment of identical twins than 
of fraternal twins. Parents, other adults, and peers 
have a tendency to treat identical twins more equally 
than they treat fraternal twins of the same sex. This 
gives identical twins a similar social and behavioral 
environmental experience, while fraternal twins 
more often are treated differently. 


3. Greater similarity of interactions between genes and 
environmental factors in identical twins than in fra- 
ternal twins. Identical twins have the same genotype 
and are affected in similar, if not identical, ways by 
environmental factors. On the other hand, fraternal 
twins have genetic differences that can be influenced 
differently by environmental factors. This may result 
in greater variance between fraternal twins than 
between identical twins. 


Because of the difficulties and the potential sources of 
error in making heritability estimates based on twin stud- 
ies, the values in Table 21.2 are more likely to be too high 
than too low. 

The study of identical twins reared together versus 
those reared apart is an alternative approach to esti- 
mating the influence of genes on phenotypic variation. 
Such studies measure the concordance, the percentage 
of twin pairs in which both members of the pair have the 
same phenotype for a trait, versus the discordance, the 
percentage in which the twins of a pair have dissimilar 
phenotypes for a trait. Concordance and discordance 
frequencies give a general picture of the overall influence 
of genes on phenotypes. If phenotypic variation for a trait 
is 100% genetic, MZ twins should always be concordant 
for their phenotypes, whether reared together or apart. 
In this case, concordance would be 100%. Dizygotic twins 
share an average of 50% of their genes in common and 
would have concordance of about 50% for a trait whose 
variation is completely genetic. When phenotypic varia- 
tion of a trait is due entirely to nongenetic factors, on the 
other hand, concordance among MZ and DZ twins will 
be approximately equal. For traits with phenotypic varia- 
tion that is determined to a significant extent by genetic 
variation, concordance among MZ twin pairs will be sub- 
stantially greater than for DZ twins. A number of human 
diseases, malformations, and other phenotypic variants 
fall into the latter category. Table 21.3 shows MZ and DZ 
twin concordance values for common malformations and 
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Table 21.3 


Concordance Values for Common 
Threshold Conditions in Humans 


Table 21.4 Some Narrow Sense Heritability (h2) 


Values for Animals and Plants 


Trait Percent Concordance Organism Trait Heritability (h?) 
MZ Twins DZ Twins Cattle Body weight 0.65 
Alzheimer disease 60 25 Milk production 0.40 
Corn Plant height 0.70 
Autism 70 10 
Ear length 0.55 
Cleft lip 40 4 Ear diameter 0.14 
Club foot 30 p Horse Racing speed 0.60 
Congenital hip dislocation 25 3 Toring speet oao 
Pig Back-fat thickness 0.70 
Depression 70 25 Weight gain 0.40 
Insulin-dependent diabetes 50 10 Litter size 0.05 
Pyloric stenosis 25 3 Poultry Body weight (8 weeks) 0.50 
E roduction 0.20 
Reading disability 70 45 EEP 
Schizophrenia 60 20 


other abnormalities that are determined to a large extent 
by genetic variation but are also a product of environmen- 
tal triggers that play as yet undetermined roles. 


Narrow Sense Heritability and Artificial 
Selection 


Narrow sense heritability (4? = V,/Vp) estimates the pro- 
portion of phenotypic variation that is due to additive ge- 
netic variance (Va), variance resulting from the alleles of 
additive genes. These estimates are particularly useful in 
agriculture, where they predict the potential responsive- 
ness of a trait in an animal or plant to artificial selection 
imposed through selective breeding programs or con- 
trolled growth conditions. High narrow sense heritability 
values are correlated with a greater degree of response to 
selection than low values, because additive genetic vari- 
ance is responsive to selection. 

Table 21.4 gives examples of h? values, covering a 
broad spectrum of magnitude, for several characteristics 
of plants and animals. Since higher 4? values have the 
strongest correlation with selection response, biologists 
predict that traits such as body weight in cattle, back- 
fat thickness in pigs, and corn plant height will be most 
amenable to change through artificial selection schemes. 
On the other hand, litter size in pigs, egg production in 
poultry, and ear diameter in corn have low h” values and 
will be less responsive to selection. 

Estimating the potential response to selection for 
a trait begins with calculation of a value known as the 
selection differential (S), which measures the difference 
between the population mean value for a trait and the 
mean trait value for the mating portion of a population. 


Suppose, for example, that a goal of an artificial selec- 
tion experiment is to increase plant height. Choosing 
taller-than-average plants to mate will be an effective 
way to increase the height of progeny if h? is high. If 
the population average height is 37.5 cm and the aver- 
age height of plants selected for mating is 42 cm, then 
S = 42 cm — 37.5 cm = 4.5 cm. 

The potential response to selection (R) depends on 
the extent to which the difference between the mating trait 
mean value and the population mean value can be passed 
on to progeny. This probability is estimated using the for- 
mula R = S(h’). For this plant height example, let’s assume 
we examine corn plant height, i” = 0.70 (see Table 21.4). 
In this case, R = (4.5 cm)(0.70) = 3.15 cm. Under stable 
growth conditions, the progeny plants could be expected 
to have a height equal to the population average plus the 
value of R, or 37.5 cm + 3.15 cm = 40.65 cm. Narrow sense 
heritability can be measured by rearranging the terms in 
the response-to-selection equation to h? = R/S. For the 
plant-height example, h? = 3.15 cm/4.5 cm = 0.70. 

Estimates of heritability have important practi- 
cal applications for plant and animal breeders, and for 
evolutionary biologists. Whether traits are subjected to 
artificial selection by breeders or to natural selection, 
the extent to which the mean value of a trait changes in 
a population depends on its heritability. Breeders and 
evolutionary biologists predict substantial change in trait 
mean values (i.e., large values for R) when heritability is 
high, but little or no change in trait mean values when 
heritability is low. In other words, traits evolve when a 
substantial proportion of the phenotypic variation is due 
to genetic variation. 

Figure 21.11a shows three examples in which the 
selection differentials are the same but the response to 
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Figure 21.11 Response to artificial and natural selection. 
(a) Response to artificial selection after one generation depends 
on h?. M is mean phenotype in parental generation; Ms is the 
mean phenotype after selection; M’ is the mean phenotype 

of offspring after selection; selection differential is S= Ms — M. 
(b) Expected changes in phenotypic means and variances after 
several generations of natural selection. 


selection differs as a result of different degrees of herita- 
bility. This comparison illustrates that selection response 
is expected to be maximal when heritability is h? = 1.0. 
Selection response is substantially less when heritability is 
h? = 0.2, and there is no selection response when herita- 
bility is h? = 0. Selection also affects quantitative traits in 
natural populations. Figure 21.11b shows natural selection 
operating over many generations in three different modes 
that have different effects on phenotypic means and vari- 
ances. In the mode known as directional selection, the 
mean phenotypic value is shifted in one direction because 


one extreme of the phenotype distribution is favored. 
This narrows the phenotypic range and reduces pheno- 
typic variance. In contrast, natural selection favoring an 
intermediate phenotype over extreme phenotypes results 
in stabilizing selection that reduces the phenotypic vari- 
ance without shifting the mean value. Disruptive selec- 
tion occurs when both extreme phenotypes are favored 
over intermediate phenotypes. The result is an increase 
in the phenotypic variance and, potentially, a phenotypic 
split within the population. 


21.4 Quantitative Trait Loci Are 
the Genes That Contribute to 
Quantitative Traits 


The genes that contribute to the variation in a quantita- 
tive trait are collectively called quantitative trait loci 
(QTLs). Individually, a gene that contributes to a quan- 
titative trait is referred to as a quantitative trait locus. 
QTLs were initially of interest in agricultural plants such 
as tomatoes and corn, where they influence important at- 
tributes including fruit sweetness, acidity, and color. QTL 
analysis has expanded greatly in recent decades through 
analysis of many distinct traits in plants and animals, in- 
cluding humans. 

In one way, QTLs are no different from other genes 
we discuss. For example, they often produce polypeptides 
that operate in metabolic pathways producing compounds 
that give flavor or color to fruit. Identifying QTLs by ex- 
perimental analysis is different from identifying other 
genes that control phenotypic variation, however, because 
many genes are influencing the trait, and the presence or 
absence of a particular allele does not correlate well with 
distinct phenotypes. Specialized statistical methods have 
been developed to detect and map QTLs. This process is 
called QTL mapping, and it involves the identification of 
chromosome regions that are likely to contain QTLs. 

The general process of QTL mapping is similar to 
the methods used to determine genetic linkage between 
genes. A chromosome region likely to contain a QTL 
is identified by the frequent co-occurrence of a specific 
genetic marker such as a single nucleotide polymorphism 
(SNP) in organisms with a particular phenotype. The 
inherited DNA sequence variation of an SNP is usually 
not the molecular basis of the QTL. Instead, the SNP is 
usually genetically linked to the QTL. The connection 
between the genetic marker and the phenotype implies 
that a QTL exists near the genome location encoding the 
genetic marker. 


QTL Mapping Strategies 


Contemporary QTL mapping uses DNA markers that 
have known chromosome locations to assist with the 
mapping and identification of genes. SNPs are particularly 
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useful in these analyses, as are other DNA marker vari- 
ants such as restriction fragment length polymorphisms 
(RFLPs) and variable number tandem repeats (VNTRs), in 
which different numbers of repeats of specific nucleotide 
base pairs occur in different chromosomes. 

Multiple approaches can be taken in QTL mapping 
experiments. At its core, however, QTL mapping is a sta- 
tistical process that seeks to identify regions of genomes 
containing genetic markers that are linked to QTLs. The 
statistical analysis for QTLs is closely related to the sta- 
tistical analysis of genetic linkage using logarithm of the 
odds (lod) score analysis (see Section 5.5). QTL analysis 
can lead to identification of the potential chromosome 
location of a QTL influencing phenotypic variation of a 
quantitative trait, but by itself it does not identify the mo- 
lecular basis of action of the QTL. Other genetic methods 
are available for molecular description of QTL action. 

QTL mapping uses the parents and progeny pro- 
duced by controlled crosses as the sources of DNA for 
genetic marker identification and as the source of data 
for the quantitative trait of interest. If, for example, a re- 
searcher wants to identify QTLs that influence large fruit 
size in tomatoes, he or she will cross two parental lines of 
tomatoes that differ in fruit size. The F; progeny of this 
cross could then be used to produce F, progeny or, as we 
illustrate here, the F; could be used in a backcross to one 
of the parental lines. Genetic markers will be determined 
in the original parental lines and in the backcross prog- 
eny. Tomato sizes produced by backcross progeny will be 
weighed and the results compared to genetic markers in 
the individual plants. 

Figure 21.12a illustrates the structure of a back- 
cross experiment designed to collect genetic marker 
and tomato-weight data for QTL mapping analysis. One 
parental tomato strain producing large fruit that aver- 
ages 100 grams (g) contains genetic markers that are 
identified by the letter L. There are actually many mark- 
ers linked to QTLs in the line, and for each marker 
gene tested, the large-tomato strain will have two cop- 
ies of the large-strain marker allele genotype designated 
LL. Similarly, a small-tomato-producing strain, with an 
average tomato weight of 10 grams, is characterized for 
the same genetic markers, and each of the loci tested in 
the small-strain genotype is designated SS. The F prog- 
eny of the large X small cross is heterozygous for each 
marker locus and is designated LS. These plants in this 
example are shown to produce tomatoes that weigh 60 g. 
The backcross is made to the large-tomato strain, and 
the marker genotype will be either LL, if the F, trans- 
mits the large-strain allele, or LS, if the F, transmits the 
small-strain allele. The backcross progeny in this example 
produce tomatoes that vary in weight from 80 g to 88 g. 
Tomato weight from the backcross plants is greater than 
from the F; plants because the backcross plants are the re- 
sult of a cross between the F; and the large-tomato strain. 

Table 21.5 displays tomato-weight data for 10 back- 
cross plants (1-10) and genetic marker data for two genes, 
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Figure 21.12 Quantitative trait locus (QTL) detection and 
mapping. (a) Parental tomato plants producing large (LL) or 
small (SS) fruit are crossed to produce F, (LS). The F4 are then 
backcrossed to the large-fruit line to yield backcross progeny 
that are either LL or LS. (b) The significance of linkage between 
potential QTLs and genetic markers is tested among backcross 
progeny by lod score analysis. A lod score profile assessing fruit- 
weight QTLs reveals significant scores exceeding the threshold 
value on tomato chromosome 2. 


marker A (M4) and marker B (Mp), that are not linked 
to one another and are located in different parts of the 
genome. In an actual QTL backcross experiment, several 
hundred backcross plants might be examined, and each 
plant might be genotyped for dozens of genetic markers 
that ideally would be spaced about every 5 to 10 centimor- 
gans (cM) in the genome. This number of genetic markers 
and their close proximity maximize the chance of identify- 
ing the location of QTLs detected by the analysis. 

In Table 21.5, the average weight of tomatoes from 
backcross plants is 84 grams. Average tomato weight is 
compared for LL plants versus LS plants for each marker. 
There is almost no difference in average weight for M4 
(LL = 83.8 g versus LS = 84.2 g), but for Mpg, LL plants 
produce tomatoes that are 4 grams heavier on average 
than are the tomatoes from LS plants (LL = 86.0 g versus 
LS = 82.0 g). These data may indicate that a QTL influenc- 
ing tomato weight is located near Mpg. Conversely, there 
is no evidence to indicate that a QTL is located near M 4. 
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Table 21.5 QTL Analysis of Tomato Weight 


in Backcross Progeny 


Average Fruit 


Backcross Plant Weight (g) Markers 
My Mp 

1 86 LS HL 
E 82 i E 

3 85 LL LL 

4 88 EE LL 

5 81 IES [ES 

6 83 LS (ES 

7 84 E 

8 80 LL ES 

9 84 LS [ES 

10 87 LS LL 

Total average weight 84 

LL average weight 83.8 86.0 

LS average weight 84.2 82.0 


To determine the statistical significance of the kind 
of information provided for genetic markers and tomato 
weight, a lod score is calculated. The lod score is an odds 
ratio of the probability of the data if a QTL is linked to 
the marker divided by the probability of the data if there 
is no QTL linked to the marker. The odds ratios for the 
backcross plants are added together, and the log (the log 
of the odds) is taken to yield the lod score. Like the analy- 
sis of lod scores for genetic linkage, there is a threshold 
value for significance of the score (see Section 5.5). If the 
lod score for a genetic marker is greater than the thresh- 
old value, the lod score indicates a statistically significant 
probability that a QTL is linked to the marker. 

In Figure 21.12b, a lod score profile for several ge- 
netic markers located on chromosome 2 of tomato reveals 
significant evidence indicating genetic linkage to a QTL. 
Beginning at the marker designated TG353 and spanning 
to the right through marker TG140, the lod score values 
are greater than the threshold value and give statistically 
significant evidence favoring linkage between these genetic 
markers and a QTL. On the other hand, the lod scores 
falling below the threshold value in the figure give no 
statistical evidence of linkage to a QTL. For chromosome 
2 in tomato, lod scores for genetic markers to the left of 
TG353 are less than the threshold lod score value. By using 
a large number of regularly spaced genetic markers dis- 
tributed every few centiMorgans along each chromosome, 
QTL mapping analysis can potentially detect the location 
of any QTL influencing a quantitative trait phenotype. 
Commonly, multiple QTLs in a genome are identified. 


Andrew Paterson and his colleagues published a 1988 
study mapping 15 QTLs in the tomato genome that influ- 
ence fruit weight, fruit acidity, and the amount of soluble 
solids in the fruit. Each trait has agricultural importance, 
and together they determine the quality and yield of tomato 
paste from the fruit. Paterson’s study used 70 DNA markers 
spaced an average of 20 cM apart throughout the tomato 
genome. Collectively, these markers span about 95% of the 
12 chromosomes that constitute the tomato genome. 

The parental plants were two closely related and 
interfertile species: a domestic tomato (Lycopersicon escu- 
lentum) and a wild South American green-fruited tomato 
(Lycopersicon chmielewskii). The F, hybrids were back- 
crossed to L. esculentum, producing 237 backcross prog- 
eny plants for analysis. All backcross plants were grown 
under identical conditions to minimize the influence of 
environmental factors on the traits of interest. Individual 
fruits from backcross plants were assayed for fruit weight 
(grams), soluble solids content (percentage), and acid- 
ity (pH). Lod score analysis was used to test whether 
genes influencing any of the three traits exhibited genetic 
linkage to genome markers. Significant lod score values 
traced six genes influencing fruit weight, five influenc- 
ing acidity, and four influencing soluble solids content to 
regions of nine chromosomes in the tomato genome. The 
regions of tomato chromosomes 6 and 7 containing QTLs 
influencing all three traits are shown in Figure 21.13. 


Identification of QTL Genes 


Since QTL mapping identifies the location of genes in- 
fluencing quantitative traits, but not the genes them- 
selves; additional genetic analysis is required to identify 
the genes. To acquire information leading to gene iden- 
tity, researchers use near isogenic lines (NILs), also 
called introgression lines (ILs). These lines are derived 
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Figure 21.13 QTL mapping in domestic tomato (Solanum 
lycopersicon). Multiple QTLs influencing fruit weight, fruit acid- 
ity, and percentage of soluble solids of tomatoes are shown on 
chromosome 6 and chromosome 7. Many other QTLs populate 
the rest of the genome. Distances between genes are in cM 
(centiMorgans). 
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from backcross progeny produced as described earlier. 
Different backcross progeny are self-fertilized over many 
generations to form highly inbred lines. The resulting 
lines are nearly isogenic, meaning they are genetically 
identical at almost all genes. The lines differ from one 
another, however, by carrying different crossovers that 
have introduced different alleles near the site of a QTL. 
The introduced differences are called introgressions, thus 
giving these lines their name. 

Figure 21.14a illustrates six introgression lines (IL1 
to IL6) descended from a cross between two original pa- 
rental lines, one a domesticated species and the other a 
wild species. The chromosome colors illustrate crossovers 
that produce differences between the introgression lines. 
Crossover locations are identified by analysis of genetic 
markers, and each introgression line is characterized for 
a trait phenotype. In the figure, the bars to the right of 
each line indicate the percentage difference between the 
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phenotype of the IL and the domesticated parental spe- 
cies. Two potential QTL regions, QTL-A and QTL-B, 
contain variations of the crossover segments. The greatest 
positive percentage difference relative to the domesti- 
cated species phenotype occurs in IL2 and IL3 that carry 
crossover chromosomes containing domesticated DNA in 
the vicinity of QTL-A and wild-species DNA near QTL-B. 
To identify the genes responsible for QTL variation, 
“candidate genes,” genes that are potentially responsible 
for the observed variation, must be identified and in- 
vestigated. Genes in the QTL-A and QTL-B regions are 
located by examining DNA sequences, and sequence vari- 
ants in candidate genes among introgression lines are 
identified. The sequence differences detected are studied 
to determine if they correlate with phenotypic variation. 
Figure 21.14b illustrates the results of experimental 
analysis of tomato introgression lines by Eyal Fridman and 
colleagues in 2004 designed to identify genes contributing 
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to Brix value in tomato. The Brix value of fruit refers to 
the total soluble solids content, of which sugars and acids 
are the primary constituents. Fridman and colleagues cre- 
ated a large number of ILs from an initial cross between 
the domesticated tomato species (Solanum lycopersicum) 
and a wild relative (Solanum pennellii). 

The parental species and each of the ILs were studied 
for Brix value, and a QTL found to have a high Brix value, 
Brix 9-2-5, was intensively studied. DNA sequencing of 
the 484 nucleotides (positions 2799 to 3283) in Brix 9-2-5 
revealed the five SNP variants shown in the figure. The 
Brix 9-2-5 QTL corresponds to a segment of the tomato 
LINS gene that produces the cell wall enzyme invertase 
(CW invertase). In the figure, the positions of SNPs are 
shown relative to 13 ILs that carry recombination in or 
near Brix 9-2-5. The bar to the right of each IL indicates 
its percentage difference in CW invertase activity rela- 
tive to S. lycopersicum. The results show that when the 
S. pennellii sequence is present, CW invertase activity is 
significantly greater than in S. lycopersicum. The SNP at 
position 2878 (boxed) was strongly correlated with in- 
creased CW invertase activity. DNA and protein sequence 
analysis revealed that this SNP produced an amino acid 
difference that altered CW invertase activity. 


Genome-Wide Association Studies 


The widespread availability of genome sequencing infor- 
mation has opened a new avenue to the identification of 
QTLs in numerous species, including humans. Known as 
genome-wide association studies (GWAS), the method 
seeks to tie the presence of a sequence variant of a DNA 
marker to a QTL influencing a specific phenotype. The 
relationship between an inherited genetic marker variant 
and the phenotype is by “association,” which means organ- 
isms that carry a particular variant are more likely to have a 
certain phenotype than are organisms that carry a different 
variant. The assessment of association is quantitative; that 
is, it expresses the percentage of organisms with a genetic 
marker that also display a certain phenotype versus the per- 
centage that have the phenotype but not the genetic marker. 

One advantage of GWAS over other QTL mapping 
approaches is that GWAS can scan the entire genome 
for QTLs by statistically testing for marker variants that 
are associated with phenotypic variation. Positive statisti- 
cal results indicating association identify chromosome 
regions that can be more closely inspected for genes that 
influence the trait. A second advantage of GWAS is that 
organisms in random mating populations can be analyzed. 
Rather than requiring controlled crosses and the forma- 
tion of introgression lines GWAS uses “cases,” or organ- 
isms with a particular phenotype, and compares them to 
“controls” that lack the particular phenotype to assess the 
association between QTL markers and a phenotype. 

This case-control approach identifies the SNP geno- 
types in all the individuals with the disease (cases) as well 
as in healthy controls. The frequency of each SNP allele 


in the cases is compared to the allele frequency in the 
controls. When the allele frequency in the case group is 
greater than the frequency in the control group, the odds 
ratio is greater than 1.0. Statistics applied to the odds ratio 
determine the P value of each odds ratio. Significant asso- 
ciation between a SNP and a disease is found when the P 
value is less than the cutoff value. The results of each SNP 
examination are plotted as described momentarily. 

GWAS takes advantage of the tendency of alleles of 
closely linked genetic markers to display linkage disequi- 
librium (see Section 5.6). Specific combinations of alleles 
in linkage disequilibrium occur at frequencies significantly 
greater than expected by chance. Linkage disequilibrium 
occurs because recombination has not reshuffled the alleles 
into random combinations. Groups of alleles in linkage 
disequilibrium form haplotypes along segments of chro- 
mosomes. If a group of closely linked SNPs form a haplo- 
type, then identification of a particular SNP for one marker 
means that other SNPs that are part of the same haplotype 
are likely to be found nearby. The presence of SNPs in 
haplotypes can be correlated with the presence (affected) 
or absence (unaffected) of a particular phenotype, such as 
a disease that is genetically influenced. The statistical test 
of association between a SNP and the disease phenotype 
is similar to a chi-square test (see Section 2.5). Like chi- 
square analysis, significance of the outcome is based on P 
values. In this statistical test, the null hypothesis is that the 
occurrence of a certain SNP and a particular phenotype is 
determined by chance. Since GWAS studies test hundreds 
to thousands of SNPs at once, the P-value threshold for sig- 
nificance in a study must be corrected for multiple hypoth- 
esis testing of many SNPs simultaneously. This means that 
the P-value threshold varies by study. Typically, however, 
significant P values are very small, as low as 107” to 1078 for 
large studies with millions of SNPs tested. 

GWAS statistical analysis identifies the presence of a 
QTL at or near the SNP location. In a sense, this provides 
statistical evidence suggesting that a QTL is located close 
by, analogously to the way significant lod scores indi- 
cate genetic linkage between genes. Additional molecular 
analysis can identify candidate genes and to link specific 
allelic variation to the production of phenotype variation. 
This is, once again, analogous to the need to find the ac- 
tual disease-causing gene after its location has been iden- 
tified through lod score analysis. 

Since 2005, when Josephine Hoh and colleagues 
identified two SNPs that are associated with a hereditary 
form of macular degeneration (an eye condition), GWAS 
has been applied to the analysis of thousands of human 
genomes. To date, approximately 4000 SNP associations 
have been found for more than 200 diseases or traits. A 
large meta-analysis (a study aggregating the results of 
many other studies) summarizing GWAS results from 
dozens of studies was published by the Wellcome Trust 
Case Control Consortium in 2007. It drew together data 
from approximately 50 studies that had collectively as- 
sessed 14,000 genomes of patient cases and 3000 control 
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genomes for seven common diseases. In total, 24 signifi- 
cant associations were detected for the seven diseases. 

Figure 21.15 shows seven “Manhattan plots” (so 
named because their high-rise profile reminds some of 
the Manhattan skyline) that plot P values on the vertical 
axis against the SNP location for each of the diseases. In 
each Manhattan plot, chromosome numbers are identi- 
fied, and statistically significant P values are highlighted in 
green. The strongest significant associations are found on 
chromosome 2 for bipolar disorder and on chromosome 
9 for coronary artery disease. Each disorder also has ad- 
ditional associations. Crohn’s disease has nine significant 
associations. The other diseases have three to seven sig- 
nificant associations. Most tests for SNP—disease associa- 
tion do not produce significant P values. These tests are 
represented by the light blue and dark blue background 
colors for each chromosome. 

For most of the chromosome regions containing sig- 
nificant associations, the identity of the suspect genes has 
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been determined. These too are shown in Figure 21.15. 
Some of these genes and associations confirm previously 
known information. For example, the chromosome 6 as- 
sociations for rheumatoid arthritis and for type 1 (insulin- 
dependent) diabetes are with the HLA-DRB1 gene in the 
HLA (human leucocyte antigen) system that is involved 
in these and several other autoimmune diseases. Other 
associations pointed to genes not previously known to be 
associated with disease. We look at the information iden- 
tifying one of these genes, CARD1S, in association with 
Crohn’s disease in the Case Study at the end of the chapter. 

Experience with GWAS analysis of the human ge- 
nome has been both positive and negative. On the positive 
side, hundreds of new genes contributing to disease risk 
have been identified. New therapies are being developed 
to target these genes in an attempt to prevent or to more 
effectively treat disease. On the negative side are unex- 
pectedly meager results. At its inception, many research- 
ers expected GWAS analysis to find many significant 
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Figure 21.15 Manhattan plots of the results of a genome-wide association study of seven common 
diseases. The vertical axis shows P values for each SNP-disease association. The 22 human autosomes 
and the X chromosomes are represented along the horizontal axis. Green dots or bars indicate the 
locations of statistically significant associations. Known genes mapping to these regions are given. 
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associations with a large number of human diseases and 
traits. This has not happened, and to date only a small 
percentage of the inherited variation thought to exist has 
been detected. Several hypotheses have been proposed 
to account for the apparent inability of GWAS to detect 
the anticipated genetic variation. One of the strongest is 


CASE STUDY 


GWAS and Crohn’s Disease 


Yasunori Ogura and colleagues used GWAS to identify several 
chromosome regions associated with Crohn's disease (CD), an 
inflammatory bowel disease that affects humans at a preva- 
lence of 150 to 200 cases per 100,000 people. The etiology of 
CD is unknown, but one prominent hypothesis proposes that 
it is an inflammatory response to intestinal bacteria and other 
microflora. 

CD clusters in families: Susceptibility to the disease is 
inherited but is influenced by multiple genes. The severity of 
CD is highly variable, from relatively mild to potentially fatal. 
Clinicians describe CD severity using a scale that captures the 
quantitative nature of the trait, making CD a candidate disease 
for QTL analysis. In the study by Ogura and colleagues, the 
strongest statistical evidence of association of a genetic marker 
with a susceptibility gene came from chromosome region 
16q12. A gene initially identified as NOD2 and subsequently 
renamed CARD15 (caspase recruitment domain, member 15), 
is a candidate for a gene influencing susceptibility to CD. 


GENE STRUCTURE AND MUTATION CARD15 encodes 
12 exons that direct the production of a 1040-amino acid pro- 
tein. Ogura and colleagues sequenced the exons and introns 
of CARD15 in 12 CD patients from different families having 
multiple cases of CD. They performed the same gene sequenc- 
ing on four healthy control individuals as well. The study iden- 
tified an identical C-G base pair insertion at nucleotide 3020 
of exon 11 in three of the 12 CD patients. The insertion, desig- 
nated 3020insC, induces a frameshift mutation that generates 
a premature stop codon, shortening the mutant protein by 
1007 amino acids. 

Ogura and colleagues developed an allele-specific poly- 
merase chain reaction (PCR) assay for 320insC and tested 101 
CD patients whose parents were heterozygous for the wild- 
type allele and the 320insC allele. Of the 101 CD patients, 68 
were homozygous for 320insC (Figure 21.16a). Biochemical 
analysis shows mutant protein from the gene has only a small 
fraction of the activity of the wild-type protein. This dimin- 
ished capacity reduces the sensitivity of the immune system 
to the microbial invader and, by a mechanism that remains to 
be elucidated, results in CD. 


OTHER CONTRIBUTING MUTATIONS Mutations of CARD15 
are not the sole cause of CD; numerous CD patients do not 
carry 320insC or any other known mutation of the gene. The 
Wellcome Trust Case Control Consortium publication in 2007 
identified nine significant associations for Crohn’s disease, 
and six of these genes have been identified, as well as CARD15 
(Figure 21.16b). 


that although the alleles causing increased disease sus- 
ceptibility are numerous, each individual allele is rare. If 
so, then the variant allele leading to disease susceptibility 
may differ from family to family. This would make finding 
statistically significant P values an occasional rather than a 
frequent event. 
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(b) SNPs significantly associated with Crohn’s disease 


Chromosome Gene 
1 IL23R 
3 ATGI6LI 
5 IRGM 
5 IBD5 
10 NKX2-3 
16 CARD15 
18 PTPN2 


Figure 21.16 Detection of 320insC in CARD15 in a family 
with Crohn’s disease. (a) Gel electrophoresis of PCR products 
from four members of a family are shown in lanes 1 through 4. 
A wild-type control is in lane 5, and molecular weight size 
markers are in lane 6. (b) Seven QTLs influencing the expres- 
sion of Crohn's disease, identified by GWAS. 


Since the identification of 3020insC, two additional muta- 
tions of CARD75 have been found to increase the risk of CD. All 
three mutations appear to be null alleles, meaning that there 
is no functional protein product produced. The role of the 
protein product of CARD75 is not fully known, but it appears 
to play a role in modulating inflammatory response. The ab- 
sence of this protein may lead to an increase in the inflamma- 
tory response, a primary feature of Crohn's disease. 


SUMMARY 


21.1 Quantitative Traits Display Continuous 
Phenotype Variation 


Quantitative phenotypic traits are polygenic and are 
described by scales of measure that can be assigned values 
having a quantitative basis. 

The phenotypes of multifactorial traits result from polygenic 
inheritance and the influence of environmental factors. 
Most quantitative traits have a continuous phenotypic 
distribution. Those influenced by larger numbers of 

genes are more likely to display continuous variation. 
Discontinuous variation in phenotype is particularly likely 
with threshold traits. 

Threshold traits are explained by additive alleles and have a 
threshold of liability that separates one phenotypic category 
(unaffected) from another (affected). The threshold of 
liability is crossed when a sufficient number of additive 
alleles accumulate in the genotype. 


21.2 Quantitative Trait Analysis Is Statistical 


| 


Quantitative traits are analyzed using statistical methods 
that evaluate the mean, median, mode, and variance of 
quantitative trait phenotype distribution. 

The frequency distribution for the phenotype range is de- 
scribed by the variance or the standard deviation in sample val- 
ues. In the case of quantitative trait phenotypes, the phenotypic 
variance (Vp) is a useful measure of the sample distribution. 
The phenotypic variance of a trait is the sum of genetic vari- 
ance (Vg) and environmental variance (Vp). 

Genetic variance is partitioned into additive variance (Va), 
dominance variance (Vp), and interactive variance (Vj), 

the latter resulting from the epistatic interaction of genes 
determining a phenotype. 
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21.3 Heritability Measures the Genetic 
Component of Phenotypic Variation 


E Heritability is a measure of the extent to which genetic 
variation contributes to total phenotypic variation. 

I Broad sense heritability (H?) measures the ratio of genetic 
variance to phenotypic variance (Vg/Vp). One method 
of applying broad sense heritability analysis to humans 
is through twin studies that give a general estimate of 
heritability. 

E Narrow sense heritability (h?) measures the contribution 
of additive genetic variance to phenotypic variance 
(Va/Vp). 

1 Narrow sense heritability is used to predict the selection 
response (R) of a trait to artificial selection. 


21.4 Quantitative Trait Loci Are the Genes That 
Contribute to Quantitative Traits 


QTL mapping is used to determine the location of potential 
QTLs in genomes using methods that closely resemble re- 
combination mapping. 

E Controlled crosses and analysis of recombinant chromo- 
somes are required for QTL mapping. 

E Specific genes influencing quantitative trait phenotypes are 
identified and their variation characterized through QTL 
candidate locus analysis. 


Genome-wide association studies (GW AS) scan the entire 
genome of organisms in random mating populations for sta- 
tistical evidence of QTLs. 
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PROBLEMS 0 MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 
Chapter Concepts For answers to selected even-numbered problems, see Appendix: Answers. 
1. Which of the following traits would you expect to be inher- 5. Describe the difference between continuous phenotypic 


ited as quantitative traits? 
body weight in chickens 
growth rate in sheep 
milk production in cattle 
fruit weight in tomatoes 
coat color in dogs 


enor p 


variation and discontinuous variation. Explain how poly- 
genic inheritance could be the basis of a trait showing 
continuous phenotypic variation. Explain how polygenic 
inheritance can be the basis of a threshold trait. 


Calculate the mean, variance, and standard deviation for a 
sample of turkeys weighed at 8 weeks of age that have the 


2. For the traits listed in the previous problem, which do you following weights in ounces: 161, 172, 155, 173, 149, 177, 
think are likely to be multifactorial traits with phenotypes 156, 174, 158, 162, 171, 181. 
that are infl db d envi t? Identi 
ee iy Provide a definition and an example for each of the follow- 
two environmental factors that might play a role in pheno- ne teris: 
typic variation of the traits you identified. B terms: 
a. additive genes 
3. Compare and contrast broad sense heritability and narrow b. concordance of twin pairs 
sense heritability, giving an example of each measurement c. multifactorial inheritance 
and identifying how the measurement is used. d. polygenic inheritance 
4. Ina cross of two pure-breeding lines of tomatoes produc- €. quantitative trait locus 
f. threshold trait 


ing different fruit sizes, the variance in grams (g) of fruit 
weight in the F4 is 2.25 g, and the variance among the F} is 
5.40 g. Determine the genetic and environmental variance 
(Vc and Vp) for the trait and the broad sense heritability of 
the trait. 


Application and Integration 


8. Three pairs of genes with two alleles each (A; and Ap, and the 2 allele is recessive. Under this revised scheme, the 
B; and By, and C; and C2) control the height of a plant. dominant phenotype contributes 10 cm to expected height 
The alleles of these genes have an additive relationship: and the recessive phenotype contributes 4 cm. 
each copy of alleles A}, B}, and C; contributes 6 cm to a. What is the expected height of a plant that is 
plant height, and each copy of alleles Az, By, and C3 homozygous for 1 alleles? 
contributes 3 cm. b. What is the expected height of a plant that is 
a. What are the expected heights of plants with each homozygous for 2 alleles? 

of the homozygous genotypes A 7A ;B;B1C;C; and c. What is the height of the F; progeny of these 
A2A 2B 2B 2C2C 9? homozygous plants? 

b. What height is expected in the F, progeny of a cross d. What are the phenotypes and proportions of each 
between A ,A ;B,B,C,C, and A 2A 9B2B C9C2? phenotype among the F,? 

c. What is the expected height of a plant with the geno- Two inbred lines of sunflowers (P4 and P3) produce dif- 
type ATAB 2B 2C1 Ce ; ferent total weights of seeds per flower head. The mean 

d. Identify all poe sible genotypes for plants with an weight of seeds (grams) and the variance of seed weights in 
expected height of 33 cm. different generations are as follows. 

e. Identify the number of different genotypes that are 
possible with these three genes. 3 F : 

f. Identify the number of different phenotypes Generation BS aes GNA OSU'S) ee oars 
(expected plant heights) that are possible with these P, 105 3.0 
three genes. l p> 135 38 

9. For the three-gene system in the previous problem, suppose I Fi p 122 35 | 
that instead of incomplete dominance among the additive = = = F 

F> 125 74 


alleles of each gene, the 1 allele is dominant in each case 


For answers to selected even-numbered problems, see Appendix: Answers. 


11. 


Subject Men Women 
Height (in.) Weight (Ib) Height (in.) Weight (Ib) 
1 65 136 60 95 
2 66 146 61 103 
3 67 141 62 110 
4 67 148 62 109 
5 68 147 62 118 
6 68 166 63 137 
7 69 16 G 152 
8 69 173 64 134 
9 69 159 Ten 127 
10 70 188 64 166 
11 70 183 65 129 
a A mon E w 
13 70 190 66 148 
14 71 169 66 152 
15 71 186 67 155 
16 71 190 67 149 
17 72 206 68 157 
18 72 210 68 138 
19 73 238 69 162 
20 74 267 70 169 


12. 


a. Use the information above to determine Vg, Vg, and Vp 
for this trait. 
b. Determine H? for this trait. 


A total of 20 men and 20 women volunteer to participate in 
a statistics project. The height and weight of each subject 
are given in the table. 


a. Draw one histogram for height of the subjects and a 
separate histogram for weight. Use different colors for 
men and women so that you can visually 
compare the distributions by sex and plot weights 
in 10-pound intervals (i.e., 90-99 Ibs, 100-109 lbs, 
110-119 lbs, etc.). 

b. Calculate the mean, variance, and standard deviation 
for height and weight in men and women. 

c. Compare the numerical values with the visual distribu- 
tion of heights and weights you drew in the histograms 
and describe whether you think your visual impression 
matches the numerical values. 


In Nicotiana, two inbred strains produce long (P1) and 
short (Ps) corollas. These lines are crossed to produce 
Fı, and the F; are crossed to produce F plants in which 
corolla length and variance are measured. The follow- 
ing table summarizes mean and variance of corolla 


13. 


14. 


15. 
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length in each generation. Calculate H? for corolla length 
in Nicotiana. 


Generation Mean Corolla Length (mm) Variance 
PL 85.75 4.21 
Ps 43.15 2.89 
F4 62.26 3.62 
E e 38.10 


Suppose the length of maize ears has narrow sense 
heritability (4°) of 0.70. A population produces ears 
that have an average length of 28 cm, and from this 
population a breeder selects a plant producing 34-cm 
ears to cross by self-fertilization. Predict the selection 
differential (S) and the response to selection (R) for 
this cross. 


Ina line of cherry tomatoes, the average fruit weight is 

16 grams. A plant producing tomatoes with an average 

weight of 12 grams is used in one self-fertilization cross to 

produce a line of smaller tomatoes, and a plant producing 

tomatoes of 24 grams is used in a second cross to produce 

larger tomatoes. 

a. What is the selection differential (S) for fruit weight in 
each cross? 

b. If narrow sense heritability (4?) for this trait is 0.80, 
what are the expected responses to selection (R) for 
fruit weight in the crosses? 


Two pure-breeding wheat strains, one producing dark red 
kernels and the other producing white kernels, are crossed 
to produce F; with pink kernel color. When an F; plant is 
self-fertilized and its seed collected and planted, the result- 
ing F, consist of 160 plants with kernel colors as shown in 
the following table. 


Kernel Color Number 
White e) 
Dark red 12 
Red 39 
Light pink 41 
Pink 59 


a. Based on the F, progeny, how many genes are involved 
in kernel color determination? 

b. How many additive alleles are required to explain the 
five phenotypes seen in the F3? 

c. Using clearly defined allele symbols of your choice, give 
genotypes for the parental strains and the F}. Describe the 
genotypes that produce the different phenotypes in the F9. 

d. Ifan F; plant is crossed to a dark red plant, what are the 
expected progeny phenotypes and what is the expected 
proportion of each phenotype? 
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16. Instudies of human MZ and DZ twin pairs of the same sex 


17. 


18. 


19. 


20. 


who are reared together, the following concordance values 
are identified for various traits. Based on the values shown, 
describe the relative importance of genes versus the influ- 
ence of environmental factors for each trait. 


Trait Concordance 
MZ DZ 
Blood type 100 65 
Chicken pox 89 87 
Manic depression 67 13 
Schizophrenia 72 12 
Diabetes 62 15 
Cleft lip 51 6 
-Club foot 40 4 


During a visit with your grandparents, they comment on 
how tall you are compared to them. You tell them that 
in your genetics class, you learned that height in humans 
has high heritability, although environmental factors 
also influence adult height. You correctly explain the 
meaning of heritability, and your grandfather asks, “How 
can height be highly heritable and still be influenced by 
the environment?” What explanation do you give your 
grandfather? 


An association of racehorse owners is seeking a new 
genetic strategy to improve the running speed of their 
horses. Traditional breeding of fast male and female horses 
has proven expensive and time-consuming, and the breed- 
ers are interested in an approach using quantitative trait 
loci as a basis for selecting breeding pairs of horses. Write a 
brief synopsis (~50 words) of QTL mapping to explain how 
genes influencing running speed might be identified 

in horses. 


Applied to the study of the human genome, a goal of GWAS 
is to locate chromosome regions that are likely to contain 
genes influencing the risk of disease. Specific genes can be 
identified in these regions, and particular mutant alleles that 
increase disease risk can be sequenced. To date, the identi- 
fication of alleles that increase disease risk has occasionally 
led to a new therapeutic strategy, but more often the identi- 
fication of disease alleles is the only outcome. 

a. From a physician’s point of view, what is the value of 
being able to identify alleles that increase the risk of a 
particular disease? 

b. What is the value of being able to identify alleles that 
increase disease risk for a person who is currently 
free of the disease but who is at risk of developing the 
disease due to its presence in the family? 

c. What personal or ethical issues arising from GWAS 
might be of concern to physicians or to those who 
might carry an allele that increases disease risk? 


Suppose a polygenic system for producing color in kernels 
of a grain is controlled by three additive genes, G, M, and T. 
There are two alleles of each gene, G; and Gz, M; and Mp, 


and T; and T3. The phenotypic effects of the three genotypes 
of the G gene are G,G, = 6 units of color, G;G)=3 units of 
color, and GG; = 1 unit of color. The phenotypic effects for 
genes M and T are similar, giving the phenotype of a plant 
with the genotype G;G;M;M;ıT;ıT; a total of 18 units of 
color and a plant with the genotype GgG2M2M T2T> a total 
of 3 units of color. 


a. How many units of color are found in trihybrid plants? 

b. Two trihybrid plants are mated. What is the expected 
proportion of progeny plants displaying 9 units of 
color? Explain your answer. 

c. Suppose that instead of an additive genetic system, 
kernel-color determination in this organism is a thresh- 
old system. The appearance of color in kernels requires 
9 or more units of color; otherwise, kernels have no 
color and appear white. In other words, plants whose 
phenotypes contain 8 or fewer units of color are white. 
Based on the threshold model, what proportion of the 
F> progeny produced by the trihybrid cross in part (b) 
will be white? Explain your answer. 

d. Assuming the threshold model applies to this kernel- 
color system, what proportion of the progeny of the 
cross G;G2M MoT oT X GıGM; MTT do you expect 
to display colored kernels? 


21. New Zealand lamb breeders measure the following vari- 


ance values for their herd. 


Trait Vp Ve Va 
Body mass (kg) 42.4 20.5 7.4 
Bodyfat(%) 389 162 57 
Body length (cm) 51.6 26.4 8.1 


a. Calculate the broad sense heritability (HÊ) and the nar- 
row sense heritability (h?) for each trait in this lamb herd. 

b. How would you characterize the potential response to 
selection (R) for each trait? 


22. Cattle breeders would like to improve the protein con- 


tent and butterfat content of milk produced by a herd of 
cows. Narrow sense heritability values are 0.60 for pro- 
tein content and 0.80 for butterfat content. The average 
percentages of these traits in the herd and the percent- 
ages of the traits in cows selected for breeding are as 
follows. 


Trait Herd Average Selected Cows 
Protein content 20.2% 22.7% 
Butterfat content 6.5% 74% 


a. Determine the selection differential (S) for each trait in 
this herd. 

b. Which trait is likely to be the most responsive to arti- 
ficial selection applied by the cattle breeders through 
selection of cows for mating? 


23. In human gestational development, abnormalities of the 


closure of the lower part of the midface can result in 
cleft lip, if the lip alone is affected by the closure defect, 


or in cleft lip and palate (the roof of the mouth), if the 

closure defect is more extensive. Cleft lip and cleft lip 

with cleft palate are multifactorial disorders that are 
threshold traits. A family with a history of either condi- 
tion has a significantly increased chance of a recurrence 
of midface cleft disorder in comparison to families with- 

out such a history. However, the recurrence risk of a 

midface cleft disorder is higher in families with a history 

of cleft lip with cleft palate than in families with a his- 
tory of cleft lip alone. 

a. Suppose a friend of yours who has not taken genetics 
asks you to explain these observations. Construct a 
genetic explanation for the increased recurrence risk 
of midface clefting in families that have a history of 
cleft disorders versus families without a history of such 
disorders. 

b. Construct a similar explanation of why the recurrence 
risk of a cleft disorder is higher in families with a his- 
tory of cleft lip with cleft palate than in families with a 
history of cleft lip alone. 


24. 


25. 
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The children of couples in which one partner has blood 

type O (genotype ii) and the other partner has blood type 

AB (genotype //*) are studied. 

a. What is the expected concordance rate for blood type 
of MZ twins in this study? Explain your answer. 

b. What is the expected concordance rate for blood type 
of DZ twins in this study? Explain why this answer is 
different from the answer to part (a). 


Answer the following in regard to multifactorial traits in 

human twins. 

a. If the trait is substantially influenced by genes, 
would you expect the concordance rate to be higher 
in MZ twins or higher in DZ twins? Explain your 
reasoning. 

b. Ifthe trait is produced with little contribution from 
genetic variation, what would you expect to see if you 
compared the concordance rates of MZ twins versus 
DZ twins? Explain your reasoning. 
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22.8 


The Hardy-Weinberg Equilibrium 
Describes the Relationship 

of Allele and Genotype 
Frequencies in Populations 
Natural Selection Operates 
through Differential 
Reproductive Fitness within a 
Population 

Mutation Diversifies Gene Pools 
Migration Is Movement of 
Organisms and Genes between 
Populations 

Genetic Drift Causes Allele 
Frequency Change by 
Sampling Error 

Inbreeding Alters Genotype 
Frequencies 

Species and Higher Taxonomic 
Groups Evolve by the Interplay 
of Four Evolutionary Processes 
Molecular Evolution Changes 
Genes and Genomes 

through Time 


Population Genetics and 
Evolution at the Population, 
Species, and Molecular Levels 


Modern humans, represented by the skull of Homo sapiens sapiens at the 
right, evolved from a branch of the human phylogenetic tree that gave rise 
to Neandertals, represented by the skull of Homo sapiens neanderthalensis 
at the left. Neandertals lived in Europe and Asia until about 30,000 years 
ago, and recent research comparing the modern human and Neandertal 
genomes finds tell-tale evidence of interbreeding between the lineages. 


n 1970, Theodosius Dobzhansky, one of the most 
influential geneticists of the 20th century, wrote: 


Nothing in biology makes sense except in the light of 


evolution. 


Dobzhansky and the other architects of the modern synthesis of 
evolution (see Section 1.4) identified evolution and evolutionary 
analysis as central organizing principles of biology, necessary for 
understanding modern forms of life and their origins. Evolution 
shaped the living world we see today, just as it shaped life in the 
past and will continue to shape life into the future. 
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The modern synthesis focused on uniting two 
elements of evolutionary biology. One was the large- 
scale evolutionary change linked to speciation and 
to the divergence of taxonomic groups above the 
species level. The second element included what 
was known about Mendelian inheritance and the 
connection between inherited molecular variation 
(i.e., variation of DNA and protein sequences) and 
evolutionary change. All four of the evolutionary 
processes—natural selection, mutation, migration, 
and genetic drift—play a role in shaping the evolu- 
tionary history of genes, proteins, populations, and 
species (see Section 1.4). 

The impact of the evolutionary processes has 
been a focus of population biologists, evolutionary 
biologists, and mathematicians since the beginning 
of the 20th century, several decades before DNA 
was identified as the hereditary molecule and its 
structure became known. Since those early days, the 
central predictions made about populations on the 
basis of evolutionary principles have been proven 
correct time and again in countless experiments 
and observations. In this chapter, we focus both on 
the evolution of populations and on evolution at 
the molecular level, that is, the evolution of genes, 
genomes, and proteins. We begin our discussion 
with the application of evolutionary principles to 
populations that forms the foundation of the field 
of population genetics. We then discuss the opera- 
tion of each of the evolutionary processes, using 
examples that largely focus on humans. The causes 
of speciation are then explored, and we conclude 
the chapter with a discussion of the evolution of 
genes and genomes. 


22.1 The Hardy-Weinberg Equilibrium 
Describes the Relationship of Allele and 
Genotype Frequencies in Populations 


The origin of population genetics can be traced to the 
earliest years of the 1900s, shortly after the rediscovery 
of Mendel’s laws of heredity, and to a time when George 
Udny Yule, William Castle, Karl Pearson, Godfrey Hardy, 
Wilhelm Weinberg, and others first debated the fate of 


genes in populations. In 1902, the inheritance of brachy- 
dactyly (OMIM 112500), an autosomal dominant condi- 
tion characterized by shortening of fingers and toes, was 
described in humans as a trait paralleling a Mendelian 
pattern of heredity. In contemplating this observation, 
Yule proposed that since three-quarters of the progeny of a 
cross of heterozygous parents with brachydactyly will also 
display shortened digits, the frequency of the dominant 
allele might be expected to increase over time. William 
Castle thought Yule was wrong, and in 1903 he offered, 
as a partial refutation of Yule’s contention, a mathemati- 
cal demonstration that in the absence of natural selection, 
genotype frequencies remain stable in populations. Karl 
Pearson supported Castle’s position by showing that if two 
alleles of a gene had equal frequency in a population, there 
would be a single, stable equilibrium frequency for their 
genotypes. Reginald Punnett (of Punnett square fame) also 
thought Yule was wrong, but unable to formulate a math- 
ematical argument to refute Yule, he took the problem to 
his friend and regular cricket partner Godfrey Hardy. 

Hardy, a mathematician rather than a biologist, quickly 
identified a “very simple” solution to the question of the 
fate of alleles in populations. He showed that with random 
mating and in the absence of evolutionary change in a pop- 
ulation, the allele frequencies result in a stable equilibrium 
frequency. Hardy also showed that, at equilibrium, allele 
frequencies are stable and that genotypes occur in predict- 
able frequencies derived directly from allele frequencies. In 
1908, Hardy penned a letter to the editors of Science maga- 
zine that began with these self-effacing words: 


I am reluctant to intrude in a discussion concerning 
matters of which I have no expert knowledge, and 
I should have expected the very simple point which 
I wish to make to have been familiar to biologists. 
However, some remarks of Mr. Udny Yule, to which 
Mr. R. C. Punnett has called my attention, suggest it 
may be worth making. 


In his letter, Hardy laid out the concept that has 
become known as the Hardy-Weinberg (H-W) equilib- 
rium. The name recognizes Hardy’s explanation of allele 
and genotype frequencies in populations as well as an in- 
dependent explanation of the same principle by Wilhelm 
Weinberg (a German physician) that was also published 
in 1908. The H-W equilibrium is a cornerstone of popu- 
lation genetics and was the first of many developments 
in evolutionary genetics that culminated in the modern 
synthesis. Hardy may have been reluctant to intrude into 
matters of biology, but biologists for more than 100 years 
have been glad he did! 


Populations and Gene Pools 


A population is a group of interbreeding organisms. The 
collection of genes and alleles found in the members of 
a population is known as a gene pool. The gene pool is 
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the source of genetic information from which the next 
generation is produced. Each population member carries 
a portion of the gene pool in its genome, but typically, the 
amount of genetic variation in a gene pool is greater than 
the variation carried by individual members of the popu- 
lation. The pattern of mating between individuals and 
the effect of evolutionary processes on alleles determine 
(1) how alleles are dispersed into genotypes and (2) their 
frequencies in successive generations. 

The H-W equilibrium serves as a model that calculates 
the frequencies of alleles and genotypes in a theoretical 
population that is infinite in size, practices random mat- 
ing, and does not experience evolutionary change. Under 
these conditions, the H-W equilibrium predicts that allele 
frequencies will be stable from generation to generation, 
that the frequencies of genotypes are predictable from their 
constituent allele frequencies, and that genotype frequen- 
cies too will remain the same in successive generations. 

In nature, however, no real population meets all the 
criteria assumed by the H-W equilibrium. For example, 
all populations are finite in size and are subject to genetic 
drift as a consequence (a phenomenon we encounter in 
Section 22.5). In addition, natural selection, migration, 
and mutation each exert their influences on a population. 
Despite these circumstances, most populations adhere 
closely enough to the assumptions of the H-W equilib- 
rium that alleles are distributed into genotypes in the 
proportions it predicts. The H-W equilibrium has proven 
to be a dependable arithmetic tool for assessing popula- 
tion genetic structure and detecting evolutionary change 
and nonrandom mating, and it is applied in numerous 
ways to the analysis of autosomal and X-linked genes in 
populations. 


The Hardy-Weinberg Equilibrium 


The predictions of the H-W equilibrium can be modeled 
for any number of alleles of an autosomal or an X-linked 
gene. The simplest model, however, is for two alleles of an 
autosomal gene, here designated A; and A», and we will 
discuss this model exclusively. The assumptions and pre- 
dictions of the H-W equilibrium are given in Table 22.1. 
The assumptions of the H-W equilibrium can be thought 
of simply as meaning that the population is infinitely large, 
experiences no evolution, and contains members that mate 
at random. As stated previously, these assumptions are not 
met by real populations, but reality is often close enough to 
the theory to allow accurate predictions to be made based 
on the H-W equilibrium. For the general case of two alleles 
of an autosomal gene, the alleles are given frequencies of 
Ff(AD = p and f(A2) = q, with the frequencies equal in males 
and females. Since A; and Az are the only alleles that occur 
at this gene, the sum of their frequencies is p + q = 1.0. 
Rearrangements of this equality allow the frequency of one 
allele to be used to determine the frequency of the other 
allele; thus, p = 1—qandq = 1 — p. 
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Table 22.1 


The Hardy-Weinberg Equilibrium 


Assumptions 


1. Population size is infinite, and no genetic drift occurs. 


2. Random mating occurs in the population, allowing 
genotype frequencies to be predicted by allele 
frequencies. 


3. Natural selection does not operate. 
4. Migration does not introduce new alleles. 
5. Mutation does not introduce new alleles. 


Predictions 


1. Allele frequencies remain stable over time. 
2. Allele distribution into genotypes is predictable. 


3. Stable equilibrium frequencies of alleles and 
genotypes are maintained. 


4. Evolutionary and nonrandom mating effects are 
predictable. 


Allelic segregation predicts the relationship between 
allele frequencies and genotype frequencies in popula- 
tions. For the two alleles in our example, there are three 
genotypes: AJA; AA» and AA». The genotype frequen- 
cies are computed using a binomial expansion [(p + q)?], 
where the two (p + q) expressions represent male and 
female contributions to mating. Alternatively, a represen- 
tation of random mating in the population that resembles 
a Punnett square can be used. Both methods make the 
same genotype frequency predictions of f(4;A;) = p’, 
f(4142) = 2pq, and f(AzA2) = q’ (Figure 22.1). The sum- 
mation of these three genotype frequencies equals unity: 
p + 2pq + qg? = 1.0. 

We can demonstrate the application of the H-W 
equilibrium by assigning frequencies to each allele in a 
hypothetical population: say, f(A;) = p = 0.6 and f(A) = 
q = 0.4. As required, the sum of the two allele frequencies 
is 0.6 + 0.4 = 1.0. In this hypothetical population example, 
60 percent of gametes carry A; and 40 percent carry A» 
(Figure 22.2). If the population is in H-W equilibrium, 
probability predicts that an A;-containing gamete from 
a male and an A;-containing female gamete will unite to 
produce A A ; progeny with a probability of (0.6)(0.6) = 0.36. 


Male gametes 


p q 

2 
Female Pi pq 
gametes 7 

pq q 


Binomial expansion 
(p+q)(p+q)=p’+pqt+pqt+q=p’+2pqtq’=1 


Figure 22.1 The Hardy-Weinberg equilibrium for autosomal 
genes. The Punnett square method and the binomial expansion 
of alleles with frequencies p and q predict genotype frequencies 
under assumptions of the Hardy-Weinberg equilibrium. 
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Male gametes 


A, A, 

0.60 0.40 

A, AA; A\A> 

Female 0.60 0.36 0.24 


gametes 


A2 A,A2 AA? 
0.40 0.24 0.16 


Binomial expansion: 
(0.60 + 0.40)(0.60 + 0.40) = 0.36 + 0.24 + 0.24 + 0.16 = 1.00 


Genotype frequencies: 
A,A, = 0.36 

AA = 0.48 

A,A, = 0.16 


Total = 1.00 


Figure 22.2 Application of the Hardy-Weinberg equilibrium. 
The Punnett square method and the binomial expansion method 
applied to a population in which f(A;) = 0.60 and f(Az) = 0.40. 


Similarly, the production of A2A» progeny, from the 
union of two A»-containing gametes, has a probability of 
(0.4)(0.4) = 0.16. Heterozygous progeny are produced 
in two ways, with a combined frequency predicted as 
(0.6)(0.4) + (0.6)(0.4) = 0.48. The sum of frequencies of the 
three genotypes is (0.36) + (0.48) + (0.16) = 1.00. The bino- 
mial expansion method of calculating the genotype frequen- 
cies in progeny makes identical predictions. 

In this example we see one of the predictions of the 
H-W equilibrium: Random mating for one generation pro- 
duces genotype frequencies that can be predicted from 
allele frequencies. For any frequencies of p and q between 
0.0 and 1.0, an expected equilibrium distribution of geno- 
type frequencies can be derived (Figure 22.3). Notice that 
as the frequency of p decreases and q increases, the pro- 
portions of genotypes shift, altering the frequency of each 
homozygous class and the frequency of heterozygotes in 
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0.85 


0.65 


0.4- 


Genotype frequencies 


0.275 
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Allele frequencies 


aD 
-0 


Figure 22.3 The Hardy-Weinberg equilibrium for two auto- 
somal alleles. Each curve shows the frequency of the genotype 
for the indicated frequencies of the alleles p and q. 


the population. Heterozygous frequency has a maximum of 
0.50 (50 percent), when the frequencies are p = q = 0.50. 

This example also allows us to observe the second 
prediction of the H-W equilibrium: With random mating 
and no evolution, allele frequencies do not change from 
one generation to the next. We see this if we count the 
alleles in progeny genotypes, recognizing that all of the 
alleles in AA, are alleles of a single type, and all the alleles 
in A2A progeny are alleles of the other type. The AA; 
progeny are 36 percent of the new generation, and AA» 
are 16 percent. Among the 48 percent of the progeny that 
are heterozygotes, exactly one-half of the alleles are A; and 
one-half are Az. Consequently, the frequency of A; among 
the progeny is 36 percent plus 24 percent, or 60 percent of 
the alleles carried by progeny, which is the same frequency 
that was seen in the parental generation. The A» frequency 
is 16 percent plus 24 percent, or 40 percent of the progeny- 
generation alleles, also the same as the frequency found 
in the parental generation. Expressed as p and q, the fre- 
quency of A; in the progeny generation is f(A;) = p* + pq, 
and the frequency of Ay is f(A) = q? + pq. 

The observation that random mating leads to predict- 
able genotype frequencies and that allele frequencies are 
stable from one generation to the next can be portrayed 
in a mating-table format that shows the consequence of 
reproduction under the assumptions of the H-W equilib- 
rium (Table 22.2). In the mating-table analysis, parental 
genotypes unite to reproduce at proportions predicted by 
their frequency. If parents have the same genotype, there 
is no reciprocal mating to account for, but if different 
genotypes occur in the parents, the reciprocal matings 
must be taken into account. The progeny of each mat- 
ing are predicted according to Mendelian principles. The 
frequency or fraction of offspring with each genotype is 
summed once the table is filled. The term that is the sum 
of each genotype frequency can be simplified to show that 
offspring are produced in the genotype proportions p’, 
2pq, and q’, just as they occur in the parents. This analysis 
is compelling evidence that in the presence of random 
mating and the absence of evolutionary change, the allele 
frequencies in populations are stable over time. 

In populations that meet the assumptions of the H-W 
equilibrium, a single generation of random mating will 
“reset” the genotype frequencies in the population into 
the predicted proportions p°, 2pq, and q’. Moreover, if 
a population is not initially in H-W equilibrium, we can 
predict the consequence of one generation of random 
mating. As an example, Figure 22.4 illustrates the effect 
of uniting two previously separate populations with dif- 
ferent frequencies of A; and Az to form a new population. 
Each of the contributing populations originally contained 
500 individuals, and the new population contains 1000 
individuals. Immediately after forming the new popula- 
tion, the genotypes are not in Hardy-Weinberg propor- 
tions. One generation of mating in the new population 
under Hardy-Weinberg assumptions, however, produces 
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Table 22.2 Hardy-Weinberg Mating Table for Two Alleles of an Autosomal Gene 
Mating Mating Frequency Progeny Genotypes 

AA AAD AzA 
AA XAA; (p?)(p”) = p* pî = = 
AA, XA,A2 2[(p*)(2pq)] = 4p?q 2p°q 2p°q = 
A1A1 X A2A2 2{(p?)(q?)] = 2p°q? — 2p°q? — 
AA2XA;A2 (2pq)(2pq) = 4p?q? pq 2p°q? pq 
AA2X A2A2 2[(2pq)(q")] = 4pq? — 2pq? 2pq? 
A2A2X A2A2 (qq? = qf = = q 
Total 1.0 p? 2pq Ga 


Among the progeny, a common term is factored out of each summation to produce the frequency of each genotype: 


A,A, = p+ 2p'q + p’q’ = p*(p’ + 2pq + q’) = p 


AM = 2p*q + 2p*q? + 2p*q? + 2pq® = 2pq(p? + pq + pq + q*) = 2pq 


2 


AA, PG 2 2pG gd = A a A a Gg) — | 
The sum of progeny genotype frequencies is p? + 2pq + q? = 1.0. 


genotype frequencies in the next generation that are in 
H-W equilibrium. The new population has new allele fre- 
quencies as a result of the mixing of the two populations. 


Determining Autosomal Allele Frequencies 
in Populations 


Allele frequencies and genotype frequencies are commonly 
used measures of the genetic structure of populations. 
Comparison of these frequencies between populations can 
identify relationships and diversification of populations, 
and documentation of allele frequency change over time is 
a hallmark of population evolution. 


Figure 22.4 One generation of random 
mating produces Hardy-Weinberg equilibrium 
frequencies for genotypes of autosomal genes. 


Population 1 


Allele frequencies in populations can be estimated by 
two methods, the gene-counting method and the square 
root method. The gene-counting method does not require 
any assumptions about the population; it only requires 
that all genotypes can be identified. The square root 
method assumes the population is in H-W equilibrium. 
The square root method is often used when the trait of 
interest is the result of a recessive homozygous genotype 
and where the heterozygous and homozygous dominant 
genotypes result in identical phenotypes. 

For the gene-counting method, the allele frequencies 
can be calculated in two ways: either by calculating the pro- 
portions of genotypes or by directly counting the number 


Population 2 


AA, 125(0.25 
A,Az 250(0.50 
A:A2 125(0.25 

500(1.00 


Two initial populations with 
different frequencies of genotypes 
and of alleles A; and A)... 


| Nel Reems 


New population 


AA, 530(0.53 

A,A, 340(0.34 

AA, _130(0.13 
1000(1.00 


A, = (0.53) + ¥2(0.34) = 0.70 
A, = (0.13) + ¥2(0.34) = 0.30 


.. -unite to form a new population 
with new genotype and allele 
frequencies. 


SS ee 


A,A, = pP’ = (0.70) 
A,A = 2pq = (0.70)(0.30) = 0.42 
A, A2= ¢ = (0.30) 


One generation of mating under 

Hardy-Weinberg assumptions 

=0.09 produces genotype frequencies in 
7.00 equilibrium. 


= 0.49 
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of alleles from the genotypes themselves. We describe 
these two approaches separately for convenience, but they 
are really the same. The choice of method is dictated by the 
type of genotype or phenotype information available and 
the composition of the population or of the sample data. 


The Genotype Proportion Method The first approach to 
gene counting is called the genotype proportion method. 
This approach calculates allele frequencies (f) by adding 
the frequency of the homozygotes for the allele and the 
frequency of one-half of the heterozygotes carrying the 
allele. As an example, suppose that a population has 
the following composition: B;B; = 0.64, BjBy = 0.32, 
BB) = 0.04. Applying the genotype proportion method, 
the frequency of B; is the sum of the frequency of B,B, plus 
one-half the frequency of B;B> heterozygotes. In this case, 
F(B) = p = (0.64) + [(0.5)(0.32)] = 0.80. Similarly, for By, 
the allele frequency is calculated by adding the frequency 
of BB) and one-half the frequency of B;By, or f(B2) = 
q = (0.04) + [(0.5)(0.32)] = 0.20. For this example, notice 
that p + q = 0.80 + 0.20 = 1.0. 


The Allele-Counting Method The second approach to 
the gene-counting method is called the allele-counting 
method. As an example of the allele-counting method, 
consider the human MN blood group system, a codominant 
system produced by two alleles, M and N. Both alleles are 
present in all human populations and produce three blood 
group phenotypes: type M, type MN, and type N. Each 
blood group has a corresponding genotype. Individuals 
with blood type M or blood type N have homozygous 
genotypes MM and NN, respectively, and the blood type 
MN is produced by the MN genotype. MN blood group 
testing of 1482 members of a Japanese population produced 
the following results: 


Blood group M MN N 


Number 406 744 332 =1482 
The allele frequency calculation recognizes that each of 
the 1482 people in the sample carries two alleles of the 
gene and that there are (2)(1482) = 2964 alleles rep- 
resented in the sample. The frequency of each allele is 
determined by counting the two alleles of that type from 
each homozygote and the single allele of that type from 
each heterozygote. The allele frequencies are therefore 
f(M) = [(2)(406) + (744)]/2964 = 0.525 and f(N) = 
[(2)(332) + (744)]/2964 = 0.475. 


The Square Root Method The alternative approach for 
allele frequency determination in populations is the square 
root method. It is used only when the two alleles of a gene 
are dominant and recessive and when the condition or trait 
of interest is recessive. In the human autosomal recessive 
disorder cystic fibrosis, for example, one allele (cf) is 
recessive and therefore is evident only in the homozygous 
genotype. When the recessive allele is in a heterozygous 
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genotype, it is “hidden” by the dominant allele (CF). In a 
circumstance like this, the dominant phenotype consists 
of two genotypes, CFCF and CFef. In contrast, the 
recessive phenotype is produced only by the homozygous 
recessive genotype cfcf. The correspondence of the 
recessive phenotype and homozygous genotype allows 
use of the Hardy-Weinberg principles to estimate the 
frequency of the recessive allele by taking the square 
root of the recessive homozygous genotype frequency. 
In the U.S. population, the frequency of cystic fibrosis 
among newborn infants is approximately 1 in 2000. Where 
f(CF) = p and f(cf) = q, f(ofef) = ¢ = 0.0005. The 
frequency of q is thus estimated as the square root of 
0.0005, or f(q) = 0.022; that is, about 2.2 percent. 

With f(cf) determined, the frequency of CF is esti- 
mated as f(CF) = p = 1 — q = 1.0 — 0.022 = 0.978. The 
frequency of carriers of cystic fibrosis is of practical im- 
portance for determining the chance that a person is a 
carrier of cystic fibrosis. According to the Hardy-Weinberg 
principle, the population frequency of carriers is f(CFcf) = 
2pq = 2(0.978)(0.022) = 0.043. In other words, approxi- 
mately 4.3 percent of the population, or about 1 in 23 
people, carry a recessive mutant allele for cystic fibrosis. 
Estimates like this can be particularly valuable in genetic 
counseling situations, where it is desirable to know the 
probability that a person who has a dominant pheno- 
type might be a heterozygous carrier of a recessive allele. 
Genetic Analysis 22.1 provides more practice in calculating 
allele frequencies and applying the H-W equilibrium. 


The Hardy-Weinberg Equilibrium for More 
than Two Alleles 


Having examined the application of the H-W equilib- 
rium to genes with two alleles, we can now consider the 
more complex case of a gene that has more than two al- 
leles. We shall limit our discussion to three alleles, whose 
frequencies are represented by the variables p, q, and r, 
where p + q + r = 1.0, and where the trinomial expan- 
sion (p + q + r) represents random mating and predicts 
the distribution of alleles in genotypes. Six genotypes 
are predicted by application of H-W equilibrium for a 
gene with three alleles (Table 22.3a). The sum of genotype 
frequencies resulting from the trinomial expansion is 
(p +q+r}? =p +2pq+ gt 2prt+r + 2qr=100. 
The human ABO blood group system provides an op- 
portunity for the application of the H-W equilibrium to a 
gene with three alleles (see Section 4.1). Recall that among 
the three alleles producing ABO blood types—/’, I, and 
i—F and F exhibit dominance over i but are codominant 
to one another. These allelic relationships result in four 
blood types from the six genotypes (see Figure 4.3). Using 
fA) = p, fO) = q, and f(i) = r, along with data report- 
ing the frequencies of each blood type in a population, 
we can estimate the frequency of each allele by applying 
a version of the square root method. This approach pro- 
vides an approximate estimate of ABO allele frequencies 
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Table 22.3 Hardy-Weinberg Equilibrium Genotype 


Frequencies for Three Alleles of a Gene 


(a) Genotype prediction for three alleles 


Genotype Genotype Frequency 
A,A, pP 
A142 2pq 
A,A3 2pr 
AA2 qi 
A2A3 2qr 
A3A3 a 
i (b) Hardy-Weinberg analysis of ABO blood group data 
Genotype Genotype Frequency” Blood Type 
ale p° = (0.23)? = 0.053 A 
Pi 2pr = 2[(0.23)(0.68)] = 0.314 A 
BB q? = (0.09)? = 0.008 B 
Ei 2qr = 2[(0.09)(0.68)] = 0.122 B 
Ae 2pq = 2[(0.23)(0.09)] = 0.041 AB 
ii Pr = (0.68)? = 0.462 (0) 


"Where AA) = p; AA) = q A3) = rrandp+q+r=10 


based on observed frequencies of each blood group in a 
population. The allele frequencies in the U.S. population, 
for example, are derived as follows: 


Step 1. Blood type O is found with recessive homozy- 
gous genotypes, and the frequency of the blood 
type is r° = 0.46. The square root of 0.46 = r; 
thus, the allele frequency is f (i) = r = 0.68. 


The combined frequency of blood types A and O is 

pP +2pr + r° = (p + r°, so fl) = pis estimated 
by the square root of the combined frequency of 

A plus O minus r. The calculation is f(/4) = p = 
v [0.37 + 0.46] — r = 0.91 — 0.68 = 0.23. 


Step 3. Having estimated p and r, we can solve for q by 
q4=1-(p + r)= 1- (0.23 + 0.68) = 0.09 


Step 2. 


In this way, from the U.S. population frequencies 
we can estimate that the frequencies of the ABO alleles 
are f(I4) = 0.23, f(I®) = 0.09, and f(i) = 0.68. Based on 
these estimated allele frequencies, Table 22.3b calculates 
genotype frequencies for the ABO blood types in the U.S. 
population. 


The Chi-Square Test of Hardy-Weinberg 
Predictions 
Strictly speaking, the assumptions of the H-W equilib- 


rium are unattainable in real populations. From a statis- 
tical perspective, however, what matters is whether the 


observed genotype frequencies in populations deviate sig- 
nificantly from the predictions of the H-W equilibrium. 
The chi-square statistic is used to compare observed and 
expected results in order to evaluate the validity of an esti- 
mate based on the H-W equilibrium. 

If it is found that a population does not deviate sig- 
nificantly from H-W equilibrium predictions, the popu- 
lation is assumed to be exhibiting random mating and 
not to be experiencing significant evolutionary change in 
the current generation. If, on the other hand, chi-square 
analysis detects a significant deviation from H-W equi- 
librium expectations, the cause can be investigated. The 
reasons differ, but for human populations the sources of 
significant deviation are most often either small popula- 
tion size, substantial migration in or out of the popula- 
tion, or nonrandom mating. We discuss these effects in 
following sections. 


22.2 Natural Selection Operates 
through Differential Reproductive 
Fitness within a Population 


Application of the H-W equilibrium to idealized popula- 
tions provides insight into the mechanism that retains 
equilibrium when evolution does not occur. In the sense 
that the allele frequencies it describes do not change from 
generation to generation, the H-W equilibrium describes 
a static situation. But what happens to allele frequencies 
when evolution does occur? The simple answer is that 
allele frequencies change, and along with them genotype 
frequencies are altered. The evolutionary impact can be 
quantified by determining the change in allele frequen- 
cies. In this section, we look at the effects of different 
mechanisms of natural selection on allele frequencies 
and H-W equilibrium. In later sections, we examine how 
the other evolutionary processes—mutation, migration 
(gene flow), and genetic drift—affect allele frequencies 
and H-W equilibrium in populations (see Section 1.4). 


Differential Reproduction and Relative Fitness 


Natural selection favors certain members of a popula- 
tion over others as a result of differences in anatomical, 
physiological, behavioral, or other traits they possess. The 
favored individuals survive to reproductive age at higher 
rates than other population members, they reproduce at 
higher rates, or both. This leads individuals with the most 
favored phenotype to be the most successful at producing 
offspring for the next generation. This phenomenon is 
called differential reproduction. 

A common way to measure the intensity of natural 
selection is to determine the impact of differential repro- 
duction on the next generation. This involves use of the 
relative fitness (w) of organisms, a value that quantifies 


GENETIC ANALYSIS 


PROBLEM A worldwide survey of genetic variation in human populations reported the autosomal 
codominant MN blood group types in a sample of 1029 Chinese from Hong Kong. The sample contained 
342 people with blood type M, 500 with blood type MN, and 187 with blood type N. 


a. Determine the frequencies of both alleles (M and N) using the genotype proportion method 


and the allele-counting method. — BREAK IT DOWN: For this codominant trait 


b. Determine the expected genotype frequencies under 
assumptions of the Hardy-Weinberg equilibrium. 


where the number of individuals with each 
genotype available, the 2058 alleles can each be 
enumerated (p. 747). 


Solution Strategies Solution Steps 


Evaluate 


1. Identify the topic this problem addresses 1. This problem addresses the determination of allele frequencies from 


and the nature of the required 


population data and the determination of expected genotype frequencies 


answer. under assumptions of the Hardy-Weinberg equilibrium. 

2. Identify the critical information given in 2. The number of individuals with each blood type is given, and the blood type 
the problem. is identified as an autosomal codominant trait. 

Deduce 


3. Determine the genotype corresponding to 3. For this autosomal codominant trait, blood type M individuals have the 


each blood group. 


genotype MM, those with blood type N are NN, and MN individuals are MN. 


4. Calculate the frequency of each blood 4. Blood type M is 342/1029 = 0.332, MN is 500/1029 = 0.486, and N is 


type in the sample. Ne 


TIP: The frequency of each geno- 
type is the number of people with the 
genotype over the total sample size. 


187/1029 = 0.186. 


Solve Answer a 


5. Calculate allele frequencies using the 
genotype proportion method. 


6. Calculate the allele frequencies by the 


allele-counting method. are 
TIP: If the allele frequencies 
are calculated correctly, their 
sum will be 1.0. 
Answer b 


(T) Determine the expected genotype 


distribution under Hardy-Weinberg MM 


5. The frequencies are 


f(M) = (0.332) + [(0.5)(0.486)] = 0.575 and 
f(N) = (0.186) + [(0.5)(0.486)] = 0.425. 


6. For the sample of 1029 people, there are 2058 alleles. The allele frequencies 


f(M) = [(2)(342)] + (500)/2058 = 0.575 and 
AN) = [(2)(187)] + (500)/2058 = 0.425. 


7. The expected genotype frequencies are 
= (0.575)? = (0.33)(1029) = 339.57, 


assumptions. 


TIP: Assume f(M) = p and f(N) = q, 
and expand the binomial equation 
p+ =p? +299 + 7°. 


MN = 2[(0.575)(0.425)] = (0.49)(1029) = 504.21, and 
NN = (0.425)? = (0.18)(1,029) = 185.22. 


For more practice, see Problems 17, 18, 21, and 25. 


the reproductive success of other genotypes relative to 
the most favored genotype. Since this is a relative com- 
parison, organisms with the greatest reproductive success 
have a relative fitness of w = 1.0. 

The genotypes that reproduce less successfully than the 
most favored genotype have a relative fitness of less than 
w = 1.0. These less fit genotypes have their relative fitness 
reduced by a proportion called the selection coefficient (s). 
The selection coefficient identifies the proportionate differ- 
ence between the fitnesses of organisms with different traits. 


Visit the Study Area to access study tools. 


MasteringGenetics™ 


For example, if an organism not having the favored trait 
reproduces 80 percent as well as the organism with the trait, 
the selection coefficient is s = 0.2, and the relative fitness 
of the organism is expressed as w = 1—s, or 1 — 0.2 = 0.8. 
If other organisms experience yet a different level of rela- 
tive fitness, a second selection coefficient, designated t, is 
used. Where an organism with one genotype is most fit and 
organisms with either of two other genotypes experience 
reduced fitness, the relative fitness values for the two less fit 
genotypes are expressed as w = 1 —s and w = 1 — £. 
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Directional Natural Selection 


The pattern of natural selection called directional natural 
selection favors one phenotype with a homozygous geno- 
type. Organisms with this phenotype have higher relative 
fitness than other phenotypes in the population. Natural 
selection favoring one homozygous genotype produces a 
directional change in allele frequencies that increases the 
favored allele frequency and decreases others. 

In the directional selection example that follows, as- 
sume alleles B; and By are codominant. The codominant 
relationship of the alleles will result in one genotype that 
occurs in organisms with the highest relative fitness and 
in reduced fitness in organisms with the other geno- 
types. In this example, where the allele frequencies are 
Ff(B) = 0.6 and f(B2) = 0.4, there are 1000 members of the 
population, the favored phenotype has a relative fitness of 
w = 1.0, and the other phenotypes have different relative 
fitness values of w = 0.80 and w = 0.40, the genetic profile 
of the population is as follows. 


Genotype BB, B,B> B>B> 
Frequency 0.36 0.48 0.16 
Number 360 480 160 
Relative fitness (w) 1.0 0.80 0.40 


In this example, the B;B; organisms have the highest 
relative fitness (w = 1.0). In comparison, B;B> organisms 
have s = 0.20 and w = 1—s = 0.80, and organisms with 
the B2B genotype have a selection coefficient of t = 0.60 
and a relative fitness of w = 1—t = 0.40. 

The impact of natural selection is computed in two 
steps. First, assuming natural selection has its effect before 
organisms reach reproductive age, the surviving number 
of organisms of each genotype is calculated by multiply- 
ing the original number of each genotype by the relative 
fitness value of the genotype. In this case the numbers of 
survivors of each genotype are B;B; = (1.0)(360) = 360, 
B By = (0.80)(480) = 384, and B2B = (0.40)(160) = 64. 
In this hypothetical population, 808 organisms of the 
original 1000 remain after natural selection. 

The second step is determination of the allele frequen- 
cies after natural selection and of the genotype frequencies 
in the next generation. In this case, the frequencies are 
most readily calculated using the allele-counting method, 
since we can identify the genotype of each survivor. There 
are a total of 1616 alleles in the 808 survivors, and the allele 
frequencies after natural selection are f(B;) = [(2)(360) + 
(384)]/1616 = 1104/1616 = 0.683, and f(B2) = [(2)(64) + 
(384)]/(2)(808) = 512/1616 = 0.317. If we assume that ran- 
dom mating takes place among the survivors, the genotype 
frequencies in the next generation are f(B;B;) = (0.683)? = 
0.467, f(B,B2) = 2(0.683)(0.317) = 0.433, and f(B»B2) = 
(0.317)? = 0.100. 

The changes in allele frequencies are symbolized by 
the Greek delta (A) and found by taking the absolute value 


of the difference between the original allele frequency and 
the new allele frequency. For this example in which B; 
has increased and B, has decreased, the values are, AB; = 
0.683 — 0.60 = 0.083, and AB, = 0.317 — 0.40 = 0.083. 
If this pattern of natural selection continues for enough 
generations, the frequency of the B; allele will even- 
tually become fixed at f(B;) = 1.0, and the frequency 
of By will be eliminated, so that its final frequency will 
be f(B2) = 0.0. Once an allele frequency is either fixed 
(f = 1.0) or eliminated (f = 0.0), natural selection can no 
longer change the frequency. Population allele frequencies 
of 0.0 or 1.0 can, however, be changed by migration and 
mutation. Figure 22.5 illustrates that directional selection 
favoring B; increases the frequency of that allele at a pace 
determined by the intensity of natural selection. 

The concept of relative fitness values can be applied 
to populations in several ways. Table 22.4 illustrates a 
case natural selection against the homozygous recessive 
in which frequencies f(B) = 0.50 and f(b) = 0.50 are 
subjected to natural selection against bb, where wp, = 0.0 
and wgp = wgg = 1.0. No bb individuals survive to repro- 
ductive age, thus removing 25 percent of the population. 
When the relative genotype frequencies are determined 
using their new proportions in the surviving reproduc- 
tive population, f(B) and f(b) are calculated to be f(B) = 
0.667 and f(b) = 0.333. Among the progeny in generation 
1, genotype frequencies are f(BB) = 0.445, f(Bb) = 0.444, 
and f(bb) = 0.111. 

Directional natural selection against the homozygous 
recessive genotype causes the frequency of the dominant 
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Figure 22.5 The consequences of the intensity of natural 


selection on allele frequency. (a) The curves illustrate the rela- 
tionship between the rate of change in f(B,) and the intensity of 
natural selection. (b) Relative fitness values for natural selection 
of different intensities. 
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Table 22.4 A Model of Directional Selection against 
a Recessive Lethal Allele 

Genotype 

BB Bb bb 
Frequency 0.25 0.50 0.25 
Relative fitness (w) 1.0 1.0 0.0 
Survivors after 
selection (total, 0.75) 0.25 0.50 0.00 
Relative genotype 0.25/0.75 =  0.50/0.75 = 
frequencies 0.333 0.667 0.00 


Estimated allele frequencies after natural selection: 
f(B) = (0.333) + (0.5)(0.667) = 0.667 
f(b) = (0) + (0.5)(0.667) = 0.333 
Estimated genotype frequencies after reproduction: 
f(BB) = (0.667)? = 0.445 
f(Bb) = 2(0.667)(0.333) = 0.444 
fbb) = (0.333)? = 0.111 


allele to increase and the frequency of the recessive allele 
to decrease. Eventually, the recessive allele may be elimi- 
nated from the population gene pool. The recessive allele 
is not eliminated quickly, however, and its frequency 
changes slowly, especially as the allele gets less frequent. 
The slow pace of evolutionary change at low allele fre- 
quencies is due to the smaller number of recessive homo- 
zygotes in the population. 

Numerous directional selection experiments, taking 
place over the last several decades of research, demonstrate 
adherence to the theoretical predictions for populations. A 
1981 study by Douglas Cavener and Michael Clegg exam- 
ined four subpopulations of Drosophila melanogaster for 50 
generations to test the effectiveness of artificial directional 
selection at increasing the frequency of the allele Adh? 
of the alcohol dehydrogenase (Adh) gene, whose enzyme 
product of Adh” rapidly breaks down ethanol. An original 
population with an Adh” frequency of 0.38 was divided 
into four subpopulations of equal size. Two subpopulations 
reared on ethanol-rich food (population 1 and population 
2) showed progressive increases in the frequency of Adh? 
over 50 generations (Figure 22.6). In contrast, control popu- 
lations (control 1 and control 2), which were reared on food 
without ethanol, showed an overall upward (control 1) and 
downward (control 2) drift of Adh? frequency. 

A similar effect is seen in the action of strong di- 
rectional natural selection in human populations. Two 
independent reports published in 2010, one by Xin Yi and 
colleagues and the other by Tatum Simonson and col- 
leagues, describe the rapid evolutionary changes that have 
occurred in the last 5000 years in native Tibetans who 
have adapted to low oxygen conditions in the high-altitude 
environment of the Himalayan mountains. Strong direc- 
tional natural selection has operated in favor of certain 
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Figure 22.6 Directional artificial selection favoring the 
Adh" allele in experimental Drosophila populations. The Adh" 
allele increases in frequency in both experimental populations 
exposed to an ethanol-rich environment. Allele frequencies 

in two control populations (no natural selection) drift up and 
down over the generations, ending up higher (control 1) and 
lower (control 2) than their starting frequencies. 


alleles of multiple genes that increase oxygen utilization 
and improve oxygen transport and metabolism. 


Natural Selection Favoring Heterozygotes 


A pattern of natural selection that can produce and main- 
tain genetic diversity in populations is seen when the 
heterozygous genotype is favored. We described this type 
of natural selection in Chapter 10 in connection with the 
evolution of the f° allele for B-globin. The consequence of 
natural selection favoring the heterozygote is a balanced 
polymorphism, in which alleles reach stable equilibrium 
frequencies that are maintained in a steady state, balanc- 
ing the selective pressures favoring the f° allele when 
it occurs in a heterozygote and acting against it when it 
occurs in a homozygous genotype. 

Table 22.5 depicts a natural selection scheme favoring 
heterozygotes. In this example, the relative fitness values 
are based on the heterozygous genotype (Cc) being 1.0, 
the relative fitness of CC being 0.80, and the fitness of 
cc being 0.20, indicating that few of these homozygotes 
survive to reproductive age. Beginning in generation 0 
with f(C) = f(c) = 0.50, natural selection changes allele 
frequencies to f(C) = 0.60 and f(c) = 0.40 in the matings 
that produce generation 1. 

Natural selection operating in favor of heterozygotes 
will eventually lead to a balanced polymorphism. Once 
attained, the equilibrium frequencies of the alleles will 
be maintained in a balanced polymorphism as long as 
natural selection remains steady. Population geneticists 
can predict the stable equilibrium frequencies of alleles 
in a balanced polymorphism using the relative intensity 
of natural selection against the homozygous genotypes. 
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Table 22.5 A Model of Natural Selection Favoring 


the Heterozygous Genotype 


Genotype 

CC (G cc 
Frequency 0.25 0.50 0.25 
Relative fitness 0.80 1.0 0.20 
Survivors after 
selection 0.20 0.50 0.05 
Relative genotype 0.20/0.75 = 0.50/0.75 = 0.05/0.75 = 
frequencies 0.267 0.067 0.667 


New allele frequencies after natural selection: 
f(C) = 0.600 
f(c) = 0.400 
Genotype frequencies after reproduction: 
(CO) = (0.60)? = 0.36 
f(Cc) = 2[(0.60)(0.40)] = 0.48 
f(cc) = (0.40)? = 0.16 


Using the variables s and ¢ to represent the natural selec- 
tion coefficients operating against the homozygous geno- 
types, the relative fitness of CC is 1 — s and the relative 
fitness of cc is 1 — t. Solving for the values of s and t, 


s = 1.0 — 0.80 = 0.20, and £ = 1.0 — 0.20 = 0.80 


The stable equilibrium for p and q, designated pz and qz, in 
the balanced polymorphism are calculated as ratios of selec- 
tion coefficients operating against the homozygous geno- 
types. In this example, the equilibrium pz and qg values are 


pe = t/(s + t) = 0.80/(0.20 + 0.80) = 0.80 and 
qg = si(s + t) = 0.20/(0.20 + 0.80) = 0.20 


Convergent Evolution 


Natural selection favors the most fit organism in a given 
environment, and on occasion, mutant alleles that are 
identical or nearly identical can be independently gener- 
ated and can be similarly favored in separate populations. 
When this occurs, the mutant alleles evolve indepen- 
dently in each population Such events produce and evo- 
lutionary phenomenon known as convergent evolution. 
One well-established example of convergent evolution of 
a morphological trait is the presence of wings in birds and 
wings in bats. Birds and mammals are distantly related, 
sharing the common ancestor of reptiles and mammals, 
but the development of wings is more recent. Wings of 
birds are traced to their reptilian ancestors and have an 
underlying anatomical structure that is distinct from what 
is found in bats. Bats developed wings through a series 
of changes that modified ancestral appendages that had 
five digits—the equivalent of the human hand. Despite 
their distinctive evolutionary histories and anatomical 


differences, the wings in bats and birds appear similar in 
form and function and are identified as convergent. 

Recent human evolutionary history provides an ex- 
ample of convergent evolution of a molecular characteris- 
tic, the ability to digest the disaccharide milk sugar lactose 
into adulthood. All humans have the ability to break down 
lactose at birth—it is a carbohydrate component of mam- 
malian breast milk—but most individuals lose that ability 
rapidly after weaning. Lactose digestion is made possible 
by production of the enzyme lactase-phlorizin hydroxy- 
lase, commonly known as “lactase,” encoded by the LCT 
gene on chromosome 2. Lactase is functionally equiva- 
lent to B-galactosidase, the bacterial enzyme that cleaves 
lactose that we discussed in Section 14.2. Individuals 
who lose the ability to digest lactose after weaning are 
often identified as “lactose intolerant,” whereas those 
who digest lactose into adulthood are termed “lactase 
persistent.” Lactase persistence is common in individuals 
of European ancestry whose ancestral populations were 
pastoralists. The evolution of lactase persistence in these 
populations is tied to the domestication of cattle 7500 to 
9000 years ago and to the availability of milk and other 
dairy products that provided a readily accessible source 
of protein. Lactase-persistent individuals have the evolu- 
tionary advantage, in the pastoral environment, of being 
able to exploit that protein source. 

Lactase persistence is not limited to Europeans, how- 
ever, and a 2007 study by Sarah Tishkoff and colleagues 
studied the evolution of lactase persistence in pastoral 
African populations in Tanzania, Kenya, and the Sudan 
to determine if lactase persistence in Europeans and 
Africans has the same genetic basis or is the result of 
different mutations and a separate evolutionary history. 
The study determined that European and African lactase 
persistence is produced by different mutations of the LCT 
gene and represents an example of convergent evolution 
of a molecular trait. 

A SNP (single nucleotide polymorphism; see Section 
10.2) in the upstream regulatory region of LCT is respon- 
sible for lactase persistence in Europeans. This SNP is a 
base-pair substitution that substitutes a cytosine with a thy- 
mine at nucleotide position 13910. The SNP is designated 
C/T-13910, and it occurs in nearly 100% of Europeans with 
lactase persistence. C/T-13910 is not found in Africans 
with lactase persistence. Instead, three other SNPs also lo- 
cated in the upstream LCT regulatory region are detected; 
G/c-14010, T/G-13915, and C/G-13907 are each associ- 
ated with lactase persistence in Africans. 

Given the molecular genetic data identifying dis- 
tinct SNPs associated with lactase persistence in pasto- 
ral Europeans and Africans, a logical hypothesis is that 
natural selection favored different mutations producing 
lactase persistence in these populations. The demonstra- 
tion of evolution producing convergence of the lactase- 
persistence trait comes from the examination of variants 
of genes that are linked to LCT but that do not play a role 


in lactase persistence. Evolutionary theory predicts that 
if a particular SNP is favored, it’s frequency will increase, 
but so too will the frequency of alleles of genes that are 
linked to the favored SNP. Natural selection directly 
favors a specific allele and indirectly favors the alleles 
of closely linked genes on the same chromosome. The 
alleles of linked genes on the same chromosome as the 
favored SNP can have their frequencies increased by a 
phenomenon known as genetic hitchhiking. In genetic 
hitchhiking, the alleles of closely linked genes that happen 
to be on the same chromosome as the favored allele are 
taken along for the evolutionary ride, at least temporar- 
ily. Initially, this produces linkage disequilibrium (LD) 
in which the favored allele and the alleles of linked genes 
occur together on chromosomes significantly more often 
than expected by chance (see Section 5.6 for a discussion). 
Genetic hitchhiking produces particular combinations of 
alleles at linked genes on chromosomes carrying the 
favored allele, leading to distinctive haplotypes on those 
chromosomes. 

Over time, homologous recombination will break up 
the haplotype and randomize the combinations of alleles 
among the linked genes. In other words, recombination 
gradually eliminates LD and restores linkage equilibrium. 
In the meantime, however, the detection of distinct hap- 
lotypes on chromosomes carrying favored alleles, such 
as the SNPs associated with lactase persistence, strongly 
suggests the operation of natural selection and the occur- 
rence of convergent evolution. The distinct haplotypes in 
Europeans and Africans with LCT SNPs is thus a hallmark 
of convergent evolution. 

Genetic Analysis 22.2 examines another example of 
the consequences of natural selection on a population. 


22.3 Mutation Diversifies Gene Pools 


Mutation is the ultimate source of all new genetic varia- 
tion in populations, and the genetic variation it generates 
is an indispensable component of evolution. By itself, 
however, gene mutation is a very slow evolutionary pro- 
cess because its effect on allele frequencies in populations 
is small and gradual. For example, if mutation converts 
one in every 10,000 A, alleles to Ag alleles each generation, 
a population containing f(A;) = 0.90 and f(A») = 0.10 
in generation 0 will have frequencies f(A;) = 0.81 and 
f(Az) = 0.19 after 1000 generations, assuming no effects 
from the other evolutionary processes. 

An additional reason that mutation alone is a slow 
evolutionary process has to do with the two directions in 
which mutation can affect any given allele. The forward 
mutation rate (p) pertains to mutations that create a new 
Aj allele by mutation of A;, whereas the reverse muta- 
tion rate (v), also known as the reversion rate, pertains 
to mutation of alleles in the opposite direction, A» to 
A;. Forward and reverse mutation can create a balanced 
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equilibrium, given a sufficient number of generations and 
the absence of other evolutionary processes. 


Quantifying the Effects and Reverse 
Mutation Rates 


In the absence of other evolutionary effects, the conse- 
quences of forward and reverse mutation (reversion) on 
allele frequencies in a population can be quantified. If 
F(AD = p and f(A2) = q, the effect of forward mutation 
on f(A,) is described by the value up, and the effect of 
reversion on f(A) = vq. These two expressions identify, 
respectively, the rate at which A, alleles are created by 
forward mutation and the rate at which A» alleles are 
reverted to A;. In each generation, the change in the fre- 
quency of A is quantified by the expression Aq (“delta q”) 
that is calculated as Ag = up — vq. Over an infinite num- 
ber of generations in a theoretical population where u and 
v are constant and no other evolutionary processes are 
operating, allele frequency equilibrium is established. 

The equilibrium frequencies of alleles subject only to 
mutation and reversion are a ratio of the frequencies of the 
respective events. Since the equilibrium frequencies are 
purely a function of the ratios of the rate at which new copies 
of an allele are added and removed from the popula- 
tion gene pool, they are calculated as pp = v/(u + v) and 
qg = u/(u + v) In a theoretical population where 
f(A) = 099, fAs) = 001, p=2* 10% and v= 
3 X 1078, Aq is expressed as Aq = [(2 X 10 °)(0.99) — 
(3 x 10 -§)(0.01)] = 1.98 X 1076. This small change 
gradually increases f(A) and decreases f(A 1), leading even- 
tually to equilibrium allele frequencies in this population 
that are 


Pe = 3 X 10 */(2 X 10° + 3 X 10°) = 0.015 and 
qg = 2 X 10 °/(2 X 10° + 3 X 10°) = 0.985 


Mutation-Selection Balance 


Unlike the theoretical population just described, mutations 
in the real world are commonly subject to natural selec- 
tion. In cases where the deleterious mutation is recessive, 
the mutant allele is masked by the wild-type dominant 
allele in heterozygous genotypes. Recessive mutant alleles 
are subjected to natural selection only when they occur in 
the homozygous recessive genotype. This results in the per- 
sistence of recessive mutant alleles in most populations at a 
frequency somewhat greater than the mutation frequency. 
Under these circumstances, the frequency of mutant 
alleles in a population is a balance of the intensity of natural 
selection against the mutant and the frequency of muta- 
tion of the gene. This expression is called the mutation- 
selection balance, and it determines the equilibrium 
frequency of the mutant allele (qg) by considering the rate 
of elimination of deleterious alleles by natural selection (s) 
and the rate at which new mutant alleles are generated (p). 


GENETIC ANALYSIS 


PROBLEM In a Drosophila species, a naturally occurring autosomal inversion is Genotype Relative Fitness 
found in two forms, Arrowhead (AR) and Standard (ST). Flies of this species can be AR/AR 0.65 
homozygous for either chromosome form (AR/AR or ST/ST), or they can be hetero- 

zygous (AR/ST). In the 1970s, researchers determined that the relative fitness values AR/ST 1.00 

for the three genotypes differed with respect to the fruit flies’ ability to resist the now ST/ST 0.50 


banned insecticide DDT. The relative fitness values are as follows: 
a. Describe the pattern of natural selection operating on these chromosomes, and make 
a statement about the eventual fate of the two chromosome forms in this species. 
b. Use the information provided to determine the equilibrium frequencies 
of AR and ST. 
BREAK IT DOWN: Natural selection can eliminate an allele (frequency 
0.0), fix an allele (frequency 1.0), or establish equilibrium frequencies for two 


or more alleles, depending on the pattern of natural selection (p. 750). 


BREAK IT DOWN: The pattern of natural selection is 
determined by the relative fitness values that assign a fitness of 1.0 
to the most fit genotype and lesser relative fitness values to the other 
genotypes (p. 748). 


Solution Strategies Solution Steps 


Evaluate 
1. Identify the topic of this problem and the 1. This problem is about the effects of natural selection on the frequencies of 
nature of the required answer. two chromosome forms, AR and ST. The answer requires an explanation of 
the pattern of natural selection and a calculation to determine the ultimate 
frequencies of the chromosome forms. 
2. Identify the critical information given in 2. The relative fitness values are given, and these can be used to determine the 


the problem. final frequencies of AR and ST. 
Deduce 
3. Examine the relative fitness values for 3. The relative fitness value for the heterozygous genotype is 1.0, and the 


each genotype, and calculate the 
selection coefficients (s and t) against 
each genotype. \ 


TIP: Subtract the relative fitness of 
a genotype from 1.0 to determine the 
selection coefficients s and t. 


relative fitnesses of the homozygous genotypes are lower. The selection 
coefficient s operating against AR/AR is 1.0 — 0.65 = 0.35. The selection 
coefficient t operating against ST/ST is 1.0 — 0.50 = 0.50. 


4. Consider how the relative fitness values 4. A ratio of relative fitness values operating against each homozygous 
can be used to calculate the final genotype can be used to calculate the equilibrium frequency of each of the 
frequencies of AR and ST. chromosome forms, with pg = ts + t) and qg = s⁄s + t). 


Solve 


5. Describe the natural selection pattern 
operating on these genotypes. 


Answer a 

5. This is an example of heterozygous advantage that is expected to retain both 
chromosome forms in the population at equilibrium values determined by 
the relative strength of natural selection against each form. 

Answer b 

6. If the equilibrium frequency of AR is pg and of ST is qg, the equilibrium frequen- 
cies are pe = 0.50/(0.35 + 0.50) = 0.588 and qg = 0.35/(0.35 + 0.50) = 0.412. 


6. Determine the equilibrium frequencies 
of each chromosome form. 


PITFALL: Double-check your arithmetic by making sure that 
the sum of the equilibrium frequencies you calculate is 1.0. 


For additional practice see Problems 4, 11, and 24. Visit the Study Area to access study tools. 


MasteringGenetics™ 


Consider the following situation for a recessive lethal 
mutation. 


Genotype AA; AA2 A2A2 
Relative fitness 1 1 ls 


Here, the equilibrium frequency of the recessive allele 
(qg) is calculated as the balance between selection 
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against a recessive genotype (s) and the rate of mutation 
(p), Le. 


qz = Ves 


This expression predicts that when selection against the 
recessive genotype is complete (i.e. s = 1.0), the equi- 
librium frequency of the mutant allele is approximately 
the square root of the mutation rate. When the selection 
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coefficient is less than 1.0, the equilibrium frequency is 
greater than the square root of the mutation frequency. 

In the case of complete selection against a lethal 
dominant mutant allele B>, the relative fitness values of 
the genotypes are as follows. 


Genotype BB; B,B2 B>B> 


Relative fitness 1 1=S 1=s 


In this case, gg = p. In other words, when s = 1.0 against a 
lethal dominant mutation, the equilibrium frequency of the 
mutant allele is equal to the mutation frequency. 

Numerous examples of mutation-selection balance 
have been investigated in organisms, including humans. 
Several studies of human hereditary disease alleles reveal 
that recessive mutant alleles are maintained in populations 
at frequencies predicted by calculating the mutation- 
selection balance. 


22.4 Migration ls Movement 
of Organisms and Genes between 
Populations 


In population genetics, migration refers to the move- 
ment of organisms and subsequent reproduction in new 
populations. Adding or subtracting alleles from a popu- 
lation can immediately change allele frequencies in the 
population, and these changes can become established 
through and the reproduction in the new populations. In 
evolutionary terms, migration is also known as gene flow, 
and the new mixed population is identified as an admixed 
population. 


Effects of Gene Flow 


Gene flow has two principal effects on populations. First, in 
the short run, gene flow can change the frequency of alleles 
in the admixed population, particularly if the starting allele 
frequencies in one of the participating populations differ 
from those in the other and if the number of immigrants 
constitutes a large proportion of the admixed population. 
Second, in the long run, gene flow acts to equalize frequen- 
cies of alleles between populations that remain in genetic 
contact by the exchange of population members back and 
forth between the populations. This exchange can also slow 
genetic divergence of populations and can block speciation. 
Let’s look at how both of these effects are explained. 

The change in allele frequencies produced in an ad- 
mixed population by gene flow from population 1 into 
population 2 can be described by the island model of 
migration that depicts a one-way process of gene flow, 
that is, from a mainland population to an island popula- 
tion. In the example illustrated in Figure 22.7a, gene flow 
changes allele frequencies by reducing f(A) on the island 
from 1.0 to 0.60 and increasing f(A2) from 0.0 to 0.40. 


(a) The island model of migration 
q v 
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Figure 22.7 The island model of migration. 


In the example shown in Figure 22.7, gene flow has 
produced an almost instantaneous evolutionary change 
(Figure 22.7b). The admixed population has allele frequen- 
cies of f(A;) = 0.60 and f(A) = 0.40, but the genotypes 
are not in H-W equilibrium immediately following mi- 
gration. A single generation of random mating, however, 
will bring the genotype frequencies into ratios consistent 
with the H-W equilibrium: A;A; = 0.36, A;A» = 0.48, and 
AA = 0.16. 

The consequence of gene flow on allele frequencies 
in an admixed population is expressed by a formula that 
calculates py, the new value of p, as the weighted aver- 
age of the allele frequency among island residents and 
mainland immigrants. The expression uses py and pc to 
represent f(A ;) in the original island and mainland popula- 
tions, respectively. The formula identifies the fraction of 
individuals or alleles from the mainland population as m, 
and the fraction contributed by island residents as 1 — m. 
The value of py as a result of gene flow is py = (1 — m) 
(p) + (m)(pc). Applying this formula to our example in 
Figure 22.7, we find py (0.20)(1.0) + (0.80)(0.50) = 0.60. 


Allele Frequency Equilibrium and Equalization 


We have just seen that gene flow can produce rapid evo- 
lutionary change in the allele frequencies of populations. 
In the short term, the effect of gene flow is determined by 
the change in the frequency of p in the new gene pool of 
the island population. This value, Apy, is the difference in 
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allele frequency before and after migration, and is defined 
as Ap, = pn — py. Substituting the formula for py and 
simplifying gives Ap; = [(1 — m)(py) + (m)(po)] — pı = 
m(pc — py). Allele frequency equilibrium occurs when 
Ap; = 0; thus, at equilibrium, m(pc — pı) = 0, indicating 
that p remains constant either when there is no migration 
(m = 0) or when p in the island gene pool equals the allele 
frequency in the mainland gene pool (pı = pc). 

Population and evolutionary biologists use this analy- 
sis to conclude that gene flow has a homogenizing, or 
equalizing, effect on allele frequencies among participat- 
ing populations. By this mechanism, gene flow maintains 
genetic contact between populations and can thus pre- 
vent evolutionary divergence of populations. In broader 
evolutionary terms, gene flow hinders the establishment 
of the reproductive isolation that is an important compo- 
nent of evolutionary divergence between populations and 
of potential speciation. 


22.5 Genetic Drift Causes Allele 
Frequency Change by Sampling Error 


The term genetic drift refers to chance fluctuations of 
allele frequencies that result from “sampling error,” a 
statistical term signifying that a small sample taken from 
a larger population is not likely to contain all alleles in 
exactly the same frequencies as in the larger population. 
Genetic drift affects all populations, but it is especially 
prominent in small populations in which a small number 
of gametes unite to produce each subsequent generation. 

To appreciate the cause and consequences of ge- 
netic drift, picture a gene pool with alleles at frequencies 
f(A) = f(A) = 0.50 from which two separate samples 
are drawn. In sample one, 20 alleles are drawn at random, 
whereas in the second sample, 1000 alleles are drawn. 
These two separate draws represent the alleles that, in 
the two respective cases, unite to form the next genera- 
tion. In the first sample, containing 20 alleles, each allele 
represents 5 percent (one allele out of 20) of the total for 
the next generation, whereas in the 1000-allele sample, 
each allele only represents 1/1000 of the alleles in the next 
generation. Any deviation from exactly 10 A, alleles and 
10 A, alleles in the first sample will substantially change 
allele frequencies in the next generation. If, for example, 
the draw of 20 alleles contains 12 A, alleles and 8 A, al- 
leles, the allele frequencies in the next generation will be 
Ff(Az) = 12/20 = 0.60 and f(A) = 8/20 = 0.40. A change 
of such magnitude can easily occur by chance in the small 
sample, but it is very unlikely to occur in the larger sample 
of 1000 alleles. 

Sampling errors of the kind described for the first 
sample can randomly raise or lower the frequency of an al- 
lele in a small population each generation. Once the allele 
frequencies are changed, the next generation, when it re- 
produces, has the new allele frequencies as a starting point. 
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Figure 22.8 Genetic drift of an allele frequency. Four 
simulated populations each start with a frequency of 0.50 for 

a hypothetical allele whose frequency fluctuates randomly 

in each population over 30 generations. The allele eventually 
becomes fixed in population 1, is eliminated in population 4, 
and is still present in populations 2 and 3 at distinct frequencies. 


Over multiple generations, the frequency of an allele in a 
small population will randomly fluctuate, or “drift,” some- 
times increasing and sometimes decreasing, due to nothing 
more than the chance deviations in small random samples. 

Allele frequency changes due to genetic drift are 
random. In the absence of any other evolutionary influ- 
ence, they may drift for a large number of generations, 
and one allele will ultimately reach fixation at a frequency 
of 1.0 and all other alleles will be eliminated. Figure 22.8 
illustrates four different simulations of genetic drift of 
an allele in experimental populations and shows how the 
result of genetic drift for 30 generations can vary among 
populations that are initially identical. Each experimental 
population begins with 20 organisms and maintains that 
number throughout the 30 generations. The initial start- 
ing frequency of each allele is 0.50 in each population. 


The Founder Effect 


Small population size provides the conditions under which 
sampling errors can produce significant genetic drift of al- 
lele frequency. One mechanism that can produce this 
outcome is called the founder effect, and it occurs when a 
new, small population branches off from a larger popula- 
tion. Since the new population founders are drawn from 
a larger original population, and the number of founders 
is small, the allele frequencies carried by the founders 
may be higher or lower from those in the original popula- 
tion, and some alleles may be missing altogether. These 
changes are due to sampling error. The founder effect can 
create new populations with allele frequencies that differ 
substantially from those found in the original population. 
Small human populations whose origins can be traced 
to religious, social, political, or other distinctions are often 
established by a small number of individuals and contain 
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few members of reproductive age. Often, the founders 
consist of several families. Since the family members are 
related and share alleles, allele frequencies among the 
founders likely will differ from allele frequencies in the 
larger population from which the founders emigrate. 

One consequences of founder effect and genetic drift 
can be high frequencies of autosomal recessive disorders 
in the new population that are rare in the original popu- 
lation. The Old Order Amish are a religious population 
established by about 200 founding members in Lancaster 
County, Pennsylvania, between 1720 and 1770. The 
founding population came from English and European 
populations and consisted of several extended families. 
Other Amish communities were established by differ- 
ent founders in Ohio, Indiana, and elsewhere in North 
America. These populations tend to be small, and mating 
within each population is common. Due to the founder 
effect, Amish populations exhibit high frequencies of 
several autosomal and X-linked recessive disorders that 
are rare in their populations of origin and in surrounding 
non-Amish communities. 

One example of a disorder found in high frequency 
in an Amish community is Ellis-van Creveld syndrome 
(EvC; OMIM 225500), an autosomal recessive disorder 
that produces short stature accompanied by short fore- 
arms and lower legs and by the frequent appearance of 
extra digits on hands or feet. In a survey of nearly 8000 
Old Order Amish in Lancaster County completed sev- 
eral years ago, 43 cases of EvC were identified. From this 
information, the estimated frequency of the allele pro- 
ducing EvC in the population is estimated by taking the 
square root of the frequency of the recessive trait in the 
population. The calculation is q = V43/8000 = 0.073, 
or about 7.3 percent. Among other Amish populations, 
and in the general (non-Amish) population, the frequency 
of this recessive allele is q < 0.001. 

The genealogical history of the Old Order Amish 
community in Lancaster County identifies that all families 
with EvC trace their genealogies to Mr. and Mrs. Samuel 
King, who immigrated to Lancaster County in 1744. At 
the time, there were about 400 people in the Lancaster 
County population, and the evidence suggests that both 
Mr. King and Mrs. King were carriers of the recessive mu- 
tant allele for EvC. This information establishes the initial 
frequency of the mutant allele in the founding popula- 
tion at approximately f(g) = 2/800 = 0.0025, more than 
twice the frequency in the population of origin. Genetic 
drift and the tendency for the Amish to mate within the 
Lancaster County community subsequently contributed 
to the rise in the frequency of the allele in the population. 


Genetic Bottlenecks 


A second mechanism producing large allele frequency 
sampling errors in small populations is the genetic bottle- 
neck, in which a relatively large population is substantially 
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Figure 22.9 Agenetic bottleneck. Catastrophic population 
reduction not due to natural selection can restrict or eliminate 
the alleles that pass through the bottleneck and alter allele 
frequencies in the surviving population. 


reduced in number by a catastrophic event independent 
of natural selection. The survivors of the bottleneck—a 
small, random sample of the original population—are 
likely to have a very low level of genetic diversity due to 
the loss of alleles from the gene pool. They are likely to 
carry alleles in frequencies that differ radically from those 
in the original population (Figure 22.9). In the statistical 
sense, founder effect and genetic bottlenecks are equiva- 
lent. Indeed, the founder effect is effectively one version of 
a genetic bottleneck. Both establish a new breeding popu- 
lation from a small subset of the ancestral population. 

The loss of genetic diversity from a genetic bottle- 
neck can be quantified in two ways: first, by determining 
the percentage of polymorphic loci in the population, and 
second, by determining the percentage of loci in an aver- 
age individual that are heterozygous. 

Genetic bottlenecks can affect single populations, 
or they can affect an entire species. An example of the 
latter case would be a near-extinction event such as the 
one that affected the Northern elephant seal (Mirounga 
angustirostris). This animal was historically distributed 
along the western coast of North America, in numbers 
that exceeded 150,000 in the mid-1800s. Extensive hunt- 
ing devastated the rookeries where young elephant seals 
were raised, and by 1884 fewer than 100 elephant seals 
remained. Some biologists have estimated that the sur- 
viving population may have been as small as 20 individu- 
als. The entire remaining population bred at an isolated 
rookery on Guadalupe Island, about 200 miles off the 
western shore of Baja California. Elephant seal protection 
measures put in place by the U.S. and Mexican govern- 
ments in the early 1900s led to population growth and 
the reestablishment of additional rookeries. Today, the 
Northern elephant seal remains a protected species that 
has returned to its historic population size of approxi- 
mately 150,000 individuals. 
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In 1974, Robert Selander and his colleagues collected 
blood samples from 159 Northern elephant seals from 
five populations and examined 24 blood protein and 
enzyme genes for evidence of genetic variation. All 24 
genes were monomorphic, and the single allele of each 
gene was identical in all five populations! About 20 years 
later, A. Rus Hoelzel and colleagues expanded the genetic 
survey of Northern elephant seals to include 43 genes in 
61 individuals from the five populations. They also found 
no genetic variation. Additionally, Hoelzel and colleagues 
examined variation of mitochondrial DNA in Northern 
elephant seals and found a low level of sequence variation 
in two distinctive mitochondrial DNA haplotypes that 
had frequencies of 0.725 and 0.275. The extremely limited 
genetic variation in Northern elephant seals is wholly 
consistent with the historical genetic bottleneck that left 
very little genetic variation in the surviving population 
members. 


22.6 Inbreeding Alters Genotype 
Frequencies 


Descriptions of population genetic structure based on the 
Hardy-Weinberg principle assume random mating within 
the population. If this assumption is not met, however—if 
mating in the population is nonrandom—the distribution 
of alleles into genotypes occurs in frequencies inconsis- 
tent with the chance predictions of the Hardy-Weinberg 
equilibrium. Inbreeding, mating between related individu- 
als, is a form of nonrandom mating that alters the distri- 
bution of alleles into genotypes. 


The Coefficient of Inbreeding 


Inbreeding, also known as consanguineous mating 
(consanguineous means “with blood”), is mating between 
related individuals who share a greater proportion of al- 
leles with one another than with random members of a 
population. The principal genetic consequences of in- 
breeding are an increase in the frequency of homozygous 
genotypes in a population and a decrease in the frequency 
of heterozygous genotypes relative to the frequencies 
expected from random matings. The likelihood of ho- 
mozygosity is increased because related organisms share 
alleles and are thus more likely to produce homozygotes, 
especially when the alleles involved are rare in the general 
population. 

Inbreeding is a normal reproductive process for self- 
fertilizing plants and for some animals that reproduce by 
self-fertilization. The effect of self-fertilization on geno- 
type proportions is shown in Table 22.6, where a hetero- 
zygous organism self-fertilizes and produces genotypes 
in generation 1 in a 1:2:1 ratio. Self-fertilization of gen- 
eration 1 individuals produces a generation 2 that has 
an overall increase in the frequency of both homozygous 


Table 22.6 | Consequences of Self-Fertilization to 


Genotype Frequencies 


P: AA (self-fertilization) 


Genotype 
Progeny Generation A,A, A,Az A,A> 
1 0.250 0.500 0.250 
2 0.375 0.250 0.375 
3 0.437 0.125 0.437 
4 0.468 0.063 0.468 


genotypes and a decrease of one-half in the frequency of 
the heterozygous genotype. The decrease in heterozygous 
frequency of one-half occurs each generation. By gen- 
eration 4, just over 6 percent of the progeny are hetero- 
zygous, and more than 93 percent are homozygous. Note, 
however, that the allele frequencies of A; and A» remain 
at f(A;) = f(42) = 0.50 in each generation. 

Among sexually reproducing organisms, the effect 
of inbreeding is similar, but it takes place over a larger 
number of generations since the proportion of organisms 
in a population participating in consanguineous matings 
is generally low. The population geneticist Sewall Wright 
investigated the consequences of inbreeding in sexually 
reproducing populations and devised the coefficient of 
inbreeding (F) as an arithmetic measure of the prob- 
ability of homozygosity for an allele obtained in identical 
copies from an ancestor. The coefficient of inbreeding 
quantifies the probability that two alleles in a homozy- 
gous individual are identical by descent (IBD), having 
descended from the same copy of the allele carried by a 
common ancestor of the inbred individual. A common an- 
cestor is an ancestor shared by two inbreeding organisms, 
and potentially the source of identical alleles that could be 
carried by the inbreeding organisms. If inbreeding takes 
place, all genes in the genome are susceptible to the same 
inbreeding effects. Thus F can also be used to estimate the 
proportion of loci that will be homozygous IBD. 

The quantification of F as a measure that a particular 
allele is IBD is most readily accomplished through pedi- 
gree analysis. The three key elements to determining F 
from pedigrees are (1) the number of alleles of a gene 
carried by common ancestors, (2) the number of trans- 
mission events required to produce a genotype that is 
homozygous IBD, and (3) the probability of transmission 
for each event linking the allele in a common ancestor 
to the inbred individual. Figures 22.10a and 22.10b show 
a mating between half-siblings having the same mother 
(I-2) as the common ancestor. The general solution for F 
is (1/2)”, where 1/2 is the probability of transmission of an 
allele and n is the number of transmission events required 
to produce identity by descent. In this example, either 
the allele A; or Ay of the mother could be transmitted to 
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Figure 22.10 Calculation of the inbreeding coefficient (F). 
(a) The probability of A; IBD equals the likelihood of four 
transmission events, each with a probability of 1/2. (b) The 
probability of A, IBD also requires four transmission events, 
each with a probability of 1/2. The likelihood of either allele IBD 
is F = 2(1/2)*. (c) With two common ancestors, there are four 
alleles (A;, Az, A3, and A3) that can be IBD. For this first-cousin 
mating, the probability for each allele IBD is the same: 

F = (1/2)É. For all shared alleles combined, F = 4(1/2)° = 1/16. 


both II-1 and II-2 and then to their offspring III-1, so the 
general solution for F is (1/2)” + (1/2). The arrows in the 
figure show the four transmission steps that are required 
for either allele to end up in IL-1 IBD. Each required 
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transmission event has a probability of 50 percent. Thus, 
the probability that either allele is found in III-1 in a ho- 
mozygous IBD genotype is (1/2)* = 1/16. For this case, 
the inbreeding coefficient is the probability that any al- 
lele of a locus is homozygous IBD; thus, for each gene, 
F = (1/2)* + (1/2)* = 1/8. Notice that the arrows in the 
figure indicating transmission of alleles from I-2 to III-3 
trace the two sides of a loop. This visual representation 
indicates the movement of the allele from generation to 
generation. If this loop were incomplete, identity by de- 
scent could not occur. 

Figure 22.10c shows a first-cousin mating in a pedigree 
in which alleles from either I-1 or I-2 could make their way 
to IV-1. Here there are four alleles, any of which could be 
IBD in IV-1. Each allele must complete six transmission 
steps (indicated by arrows in the figure) to be identical 
by descent in IV-1, and the transmission probability for 
each step is 1/2. For each allele carried by I-1 and each al- 
lele carried by I-2, the probability the allele is IBD in IV-1 
is (1/2)° = 1/64. For this pedigree, there are four alleles 
for each gene, two per common ancestor, and F is deter- 
mined by adding the probability of the four complete loops 
(one for each allele) that could link an allele in a common 
ancestor to an inbred homozygous IBD descendant. In 
this case, F = (1/2)® + (1/2)® + (1/2)® + (1/2)® = 1/16. 
The value can also be determined as F = 4(1/2)° = 1/16. 
Genetic Analysis 22.3 demonstrates another computation 
of an inbreeding coefficient. 

First-cousin mating is a form of inbreeding that is 
relatively common in many human societies and is com- 
mon in mammals in general. Negative genetic outcomes 
in the form of infants with recessive conditions due to 
inbreeding occur when a recessive allele is very rare in a 
population (i.e., q = 0.005 or less). In such cases there can 
be a 20- to 30-fold increase in the likelihood that a first- 
cousin mating will produce a child with a recessive phe- 
notype compared to the risk by random mating. However, 
when the recessive allele frequency is as common as 
q = 0.01, for example, the chance of producing a recessive 
homozygote from a first-cousin mating is only a few times 
more likely than the chance of producing a recessive ho- 
mozygote by random mating. The effect disappears as the 
frequency of q in the population increases further. 


Inbreeding Depression 


The genetic consequences of inbreeding for populations 
are an increase in the frequency of homozygous geno- 
types and a decrease in the frequency of heterozygous 
genotypes. One immediate impact of these consequences 
is seen in conservation genetics, where small, captive 
populations of individual organisms are bred to per- 
petuate a nearly extinct species. The increased frequency 
of homozygosity can lead to a phenomenon known as 
inbreeding depression, the reduction in fitness of inbred 
organisms, often as a result of the reduced level of genetic 


GENETIC ANALYSIS 


PROBLEM The pedigree shown here depicts crosses performed as part 1 
of an antelope captive-breeding program. Use the pedigree information to 
calculate the coefficient of inbreeding (F) for the mating of IV-1 and Ill-3 that 

produces the animal identified as V-1. | 


BREAK IT DOWN: Eachallele 
transmission probability is 1/2. Individual 
V-1 has two common ancestors, either of 
whom could be the source of an allele that 


is IBD (p. 759). 


Solution Strategies 


Solution Steps 


Evaluate 


1. Identify the topic of this problem and 1. This problem concerns determination of the coefficient of inbreeding (F) for a 


the nature of the required answer. 


specific mating. 


2. Identify the critical information given 2. The pedigree depicting the common ancestry of the related animals is given. 


in the problem. 


Deduce 


3. Count the number of transmission 3. Counting from a common ancestor to individual V-1, there are seven transmission 


events that must occur for an allele to 
be identical by descent (IBD) in V-1. 


steps required to produce an allele that is IBD. 


4. Identify the transmission probability 4. For an autosomal allele, the transmission probability is 1/2. 


for each step of transmission. 


5. Identify the total number of alleles 5. There are two common ancestors (l-1 and l-2) for the inbred individual (V-1). 


of an autosomal gene in the common 


ancestors of V-1. at each locus. 


There are two alleles per gene in each common ancestor, for a total of four alleles 


Solve 


6. Calculate the coefficient of 6. The coefficient of inbreeding is F = 4(1/2)’ = 1/32. 


inbreeding for this pedigree. 


For more practice, see Problems 33, 34, 35, and 36. 


Visit the Study Area to access study tools. 
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heterozygosity. The reduced fitness associated with in- 
breeding depression can be due either to an increase in 
the proportion of deleterious homozygous genotypes or 
to the higher fitness of heterozygotes. 

The magnitude of inbreeding depression depends on 
the organism. Among plants that naturally reproduce by 
self-fertilization, the amount of inbreeding depression 
is small. Many bird species also experience only rela- 
tively minor inbreeding depression. This lack of negative 
consequence has been particularly beneficial in captive 
breeding programs that have bred bird species such as the 
California condor and then reintroduced the birds into 
their natural environment. In contrast to birds and plants, 
however, mammals experience severe inbreeding depres- 
sion. Consequently, captive breeding programs for mam- 
mals must be managed far more carefully, from a genetic 
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perspective. Multiple studies indicate that breeding pro- 
grams designed to avoid inbreeding reduce the mortality 
of inbred mammals caused by inbreeding depression. 


22.7 Species and Higher Taxonomic 
Groups Evolve by the Interplay of Four 
Evolutionary Processes 


Our discussion to this point has focused on microevolu- 
tion, that is, evolution operating at the population level. 
In this section, we turn our attention to evolution at the 
species level and above. 

This evolutionary change is driven by reproduc- 
tive isolation that can result from any morphological, 
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behavioral, or geographic condition or set of conditions 
that prevents one population from breeding with others. 
Reproductively isolated populations adapt separately to 
their particular circumstances, and divergence is a likely 
consequence. In each environment, differential reproduc- 
tive success driven by natural selection allows the better- 
adapted organisms to leave more progeny. 


Processes of Speciation 


Charles Darwin was the first to describe the concept that 
existing species evolve from preexisting species. In his 
famous 1859 book, On the Origin of Species by Means of 
Natural Selection, he laid out two guiding principles of 
species formation that are still considered fundamental 
aspects of macroevolution. First, Darwin proposed that 
hereditary variation is present in all species and controls 
the phenotypic variability in each species. Second, Darwin 
proposed that natural selection allows species members 
with favored phenotypic attributes to survive and repro- 
duce in greater numbers than species members with other 
phenotypes. Darwin described his model combining these 
principles as “the theory of descent with modification 
through variation and natural selection.” In other words, 
Darwin viewed inherited variation and the operation of 
natural selection as the elements essential to the transfor- 
mation of one species into another. 

Innumerable biological investigations in the last 150 
years have verified and elaborated upon Darwin’s origi- 
nal proposals as well as quantified the effects and the 
interplay of each of the four evolutionary processes (natu- 
ral selection, mutation, migration, and genetic drift) on 
speciation. The clear picture of speciation that emerges 
from these studies is that the evolutionary lineages lead- 
ing from ancestral organisms to descendant forms are 
almost never simple, straight lines of descent. Instead, 
the evolutionary history of modern species is filled with 
side branches that died out because a species, once de- 
veloped, could not adapt to new environments or was 
displaced by competing species. It can be tempting to look 
backward into the evolutionary past and identify a linear 
step-by-step process leading to modern species, but this 
perspective minimizes the occurrence of adaptive changes 
that led to evolutionary “dead ends.” More important, 
the backward-looking approach ignores a major reality of 
evolution: Evolutionary history is far more like a multi- 
branched bush rather than like a tree with a long, straight 
trunk connecting past and present. 

The evolutionary history of modern horses and their 
relatives zebras and donkeys (all three being members of 
the genus Equus) is an example of the typical complexity of 
evolutionary history (Figure 22.11). One can trace a lineage 
leading more or less directly from Hyracotherium in the 
early Eocene (about 54 million years ago) to modern Equus, 
but this would ignore the many other branches of the evo- 
lutionary tree that did not produce modern-day organisms. 


The evolutionary tree of Equus also illustrates two 
patterns by which new species evolve from preexisting 
species. Anagenesis is the divergence of a lineage from 
a common ancestor that produces new forms or new 
species over time. During anagenesis, a preexisting spe- 
cies experiences natural selection that leads to adaptive 
change of that species into a new species. In contrast, the 
pattern of species evolution known as cladogenesis is 
one of branching in which an ancestral species gives rise 
to two or more new forms or new species. 


Reproductive Isolation and Speciation 


Although anagenesis and cladogenesis are distinct patterns 
of species formation, they share two essential features: 
(1) inherited variation controlling critical phenotypic 
variation and (2) adaptation through natural selection. 
Reproductive isolation is also an important component 
for both cladogenesis and anagenesis, although the precise 
mechanisms of isolation may differ. 

The concept of cladogenesis and reproductive 
isolation of species derives from work by Theodosius 
Dobzhansky, Ernst Mayr, and other evolutionary biolo- 
gists who recognized that new species can form when 
reproductive barriers prevent the exchange of genes be- 
tween populations. In describing the necessity of repro- 
ductive isolation in this context, two mechanisms are 
identified (Table 22.7). Prezygotic mechanisms of repro- 
ductive isolation are those that prevent mating between 
members of different species or prevent the formation 
of a zygote following interspecies mating. On the other 
hand, postzygotic mechanisms of reproductive isolation 
result in the failure of a fertilized zygote to survive, or 
result in sterile offspring of an interspecies mating. These 
mechanisms of reproductive and genetic isolation lead to 
patterns of speciation that are most frequently allopatric 
or sympatric, two patterns defined below. 


Allopatric Speciation In allopatric speciation, populations 
are separated by a physical barrier. New species can 
develop in separate geographic locations as a consequence 
of their reproductive isolation. Two principal mechanisms 
create the separations that lead to reproductive isolation: 
(1) physical separation of a segment of a large population 
by a physical barrier that prevents gene flow and (2) 
colonization of new territory (Figure 22.12). Geographic 
events such as the advance of a glacier, the emergence of a 
mountain range, change in flow pattern ofa river, or erosion 
of a canyon are typical of the kinds of physical changes that 
lead to reproductive isolation and species diversification. 
An example of this kind of geographic separation and 
species development is found in the American Southwest, 
where the formation of the Grand Canyon beginning 
5 to 6 million years ago split an ancestral species of 
ground squirrel and led to its eventual diversification into 
two distinct species. Today, Ammospermophilus leucurus 
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Figure 22.11 Evolution of the genus Equus. This multibranched evolutionary tree includes 


examples of anagenesis and cladogenesis as well as numerous examples of evolutionary branches that 
did not lead to modern species. 


Table 22.7 


Mechanisms of Reproductive Isolation 


Prezygotic Mechanisms 


Behavioral isolation: Patterns of sexual behavior in different species are incompatible, or sexual attraction is lacking between them. 


Gametic isolation: Mating takes place between different species, but the gametes fail to unite with one another due to 
differences in gamete compatibility or to failure of male gametes to survive until fertilization of female gametes. 


Geographic isolation: Species reside in separate geographic locations or are separated by geographic features that prevent 
their contact. 


Habitat isolation: Species inhabit different ecosystems that prevent them from coming into contact. 
Mechanical isolation: Male and female genitalia or reproductive structures of different species are anatomically incompatible. 


Temporal isolation: Timing of reproductive ability or receptivity in different species is incompatible. 


Postzygotic Mechanisms 


Hybrid breakdown: Viable and fertile interspecies hybrids form, but after the F; generation the fitness of the progeny of 
hybrids is less than that of progeny from non-hybrids. 


Hybrid inviability: The fertilized zygote of an interspecies mating fails to survive gestation. 


Hybrid sterility: Interspecies hybrids are viable but infertile. 
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Figure 22.12 Processes leading to allopatric speciation. 


is a gray-colored ground squirrel found on the north rim 
of the Grand Canyon, whereas squirrels on the south 
rim of the canyon are members of the chestnut-colored 
Ammospermophilus harrisii. 

The colonization model of allopatric speciation pre- 
dicts that new species diversify following colonization of 
new habitats. The diversification of Drosophila species 
on the Hawaiian Islands is a case study of this mecha- 
nism (Figure 22.13). The Hawaiian Islands are part of a 
long chain of landmasses and submarine structures that 
stretch in a northwest-to-southeast direction and are pro- 
duced by the movement of the Pacific tectonic plate over 
a volcanic hotspot that lies in the earth’s mantle beneath 
it. As the plate slides toward the west, new islands are 
produced by volcanic activity of the hotspot. The oldest 
of the islands are Nihau and Kauai to the northwest; the 
youngest island is Hawaii, which is still growing by volca- 
nic eruptions of Mauna Loa and Kilauea. 

In 2005, James Bonacum and his colleagues examined 
genetic and morphologic data in numerous Hawaiian 
Drosophila species to test the allopatric speciation model. 
They found that the most closely related species occur on 
adjacent islands and that the phylogenetic pattern of spe- 
cies formation corresponds to the pattern of emergence of 
islands. These results provide support and documentation 
for the model of allopatric speciation by colonization. 


In sympatric speciation, popula- 
tions share a single habitat but are isolated by genetic, be- 
havioral, seasonal or ecosystem-based or mechanisms that 
prevent gene flow. Species that diverge while occupying the 
same geographic area are sympatric species. 


After colonization 


One clear example of sympatric speciation occurs in 
plant species that diversify from one another through the 
development of polyploidy. Mating between a polyploid 
species and one that is not polyploid can result in reduced 
fertility of hybrid individuals. Section 13.2 discusses the 
development of polyploidy through nondisjunction and 
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Figure 22.13 Phylogenetic relationships among Hawaiian 
Drosophila species. Evolutionary evidence supports the colo- 
nization of younger islands and the formation of new species 
following migration from older islands. 
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highlights the evolution of the modern bread wheat spe- 
cies (Triticum aestivum) from a wild diploid grass to 
its contemporary allohexaploid form (see Figure 13.9). 
Animals that develop nocturnal or diurnal patterns of 
activity that make them more likely to encounter only 
those other members of the population that are active at 
the same time are another example of potential sympatric 
speciation. Similarly, changes in the seasonality of repro- 
duction can limit organisms to the ability to reproduce 
only during certain times of the year. Organisms living 
in the same geographic area that do not have the same 
reproductive seasonality will be unable to mate. 


Contemporary Evolution in Darwin’s Finches 


As the Hawaiian Islands study of Drosophila clearly il- 
lustrates, the examination of living organisms can reveal 
enlightening details of evolutionary relationships. Another 
example comes from long-term studies of Darwin’s 
finches on Daphne Major in the Galapagos Island chain. 
The studies are by Peter and Rosemary Grant, a husband- 
and-wife team, who have studied finches on Daphne 
Major since the early 1980s and have provided powerful 
support for Darwin’s concepts of evolution and speciation. 

In 2004, the Grants were part of one of two research 
teams that identified bone morphogenic protein (Bmp) 
as the protein likely to be responsible for beak shape 
variation in Darwin’s finches and other birds. One study 
found that beak shape in chickens could be altered by 
changing the timing and level of expression of Bmp*, the 
gene that produces Bmp, in chicken embryos. Increased 
Bmp production strongly correlated with larger beak 
size. The research group including the Grants studied 
Bmp in Darwin’s finches and found that finches with 
larger beaks produced Bmp earlier in development and 
produced more Bmp than did finches with smaller 
beaks. 

In 2006, the Grants reported on a shift in the shape 
of the beak of Daphne Major finches that was brought 
about by a drought and decrease in the availability of an 
important seed resource. They observed that finches with 
medium beak size survived and that those with a large 
beak did not. This is precisely what Darwin predicted to 
be the mechanism for evolution of differing character- 
istics among the finches he observed in the Galapagos 
Islands. The Grants’ observation traces back to the early 
1980s when a species of large ground finches with a large 
beak arrived on Daphne Major from a neighboring island. 
The newly arrived large-beaked finches were able to eas- 
ily break open the large seeds of a particular herbaceous 
plant that the medium-beaked finches native to Daphne 
Major rarely utilized because they could break open only 
with great effort. These large seeds became the exclusive 
food source for the large-beaked finches. The numbers of 
large-beaked finches grew, but for reasons of differences 
in mate preference between the two types of finches, they 
did not interbreed. When Daphne Major was hit with 


a drought in 2003 and 2004, the herbaceous plant that 
produced the large seeds was particularly hard hit. The 
medium-beaked finches were able to exploit the smaller 
seeds available from other plants, but the large-beaked 
finches died out almost entirely due to their inability to 
switch to eating smaller seeds of other plants. 

In 2004 and 2005, the Grants documented that the 
distribution of beak sizes on Daphne Major had changed 
dramatically. The Grants attributed these changes to the 
preferential survival of native medium ground finches 
that ate small seeds over finches that preferred large 
seeds. 

In 2009, the Grants described evidence of a poten- 
tial new speciation event under way on Daphne Major. 
Following the drought in 2003-2004, only a single mat- 
ing pair of the large ground finches survived on Daphne 
Major. This pair mated and produced offspring, and in 
the ensuing years, the number of, large-beaked finches 
increased. Mating still does not occur between the two 
types of finches, thus the two forms are reproductively 
isolated, suggesting the possibility that the two closely re- 
lated finch populations on Daphne Major could be in the 
early stages of a new speciation event. 


22.8 Molecular Evolution Changes 
Genes and Genomes through Time 


The heritable variation that provides the raw material 
of evolution begins at the molecular level, with altera- 
tions in DNA sequence and proteins. These molecular 
changes are part and parcel of the evolutionary process 
and they can be examined at several levels, from the 
evolution of individual genes and gene families to the 
evolution of entire genomes. In this final section, we 
describe two avenues of molecular evolutionary analysis. 
The first is the study of the evolution of gene families. 
We look specifically at genes that are members of the 
vertebrate steroid receptor (SR) family and discuss the 
evolutionary process that has generated multiple new 
molecular functions from an ancestral gene of limited 
function. Our second examination concerns the evolu- 
tion of the human genome and the consequences of the 
introgression of Neandertal DNA by human—Neandertal 
interbreeding. 


Vertebrate Steroid Receptor Evolution 


Evolutionary theory predicts that novel molecular func- 
tions arise as consequence of the action of natural selection 
on favorable mutations. For complex systems with multiple 
active elements, however, the challenge for evolutionary 
biology is to identify how new protein functions arise when 
all the components of the complex are not initially present. 
The evolution of vertebrate steroid receptors (SRs) illus- 
trates one way in which this has occurred. Dissection of 
this process shows how an ancestral receptor with a single 
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original function duplicated and diversified to produce 
new genes and proteins with the ability to bind new com- 
pounds. The basic scenario of duplication of the ancestral 
gene, followed by diversification of function, is one com- 
monly encountered in evolutionary biology. 

Contemporary vertebrates possess several closely re- 
lated genes that are responsible for producing SR proteins. 
Functionally, SR proteins are a family of cell surface pro- 
teins that have a ligand (hormone)—binding domain out- 
side the cell to bind hormones. Hormone binding changes 
the SR protein conformation so as to initiate the transcrip- 
tion of particular genes. In this way, the hormones act 
as signaling molecules that work through SRs to initiate 
transcription. The contemporary vertebrate SR protein 
family includes two estrogen receptors (ERa and ERB) and 
one receptor each for androgens (AR), progesterones (PR), 
mineralocorticoids (MR), and glucocorticoids (GR). 

The SR proteins have highly conserved DNA-binding 
domains (DBDs) that recognize specific DNA sequences 
called response elements in the promoter regions of spe- 
cific target genes. Hormone binding to the ligand-binding 
domains (LBDs) triggers conformational change of the 
protein into its transcription-activating forms that are ca- 
pable of DNA binding to response elements. Activated ER 
proteins recognize the response element sequence AGGTCA; 
the other activated SR proteins (AR, PR, MR, and GR) rec- 
ognize the response element sequence AGAACA. Differences 
in response element recognition enable the vertebrate SR 
proteins to mediate hormone-induced transcription of a 
range of different genes. SR proteins that are closely related 
to the vertebrate SRs have been found in mollusks, anne- 
lids, and the invertebrate cephalochordates. How did this 
closely related yet diversified family of proteins evolve? 


Novel Functions from the Ancestral Steroid 
Receptor The first step in tracing the evolutionary 
pathway of SR protein diversification is to identify the 
clades to which contemporary SRs belong. Based on their 
sequences, two major SR clades have been identified. 
One contains the ERs, and the other contains the other 
SR proteins (ARs, PRs, MRs, and GRs) (Figure 22.14). 
The SR proteins of all organisms have the capacity to 
bind estrogen as a ligand. This is one of several clues 
indicating that the ancestral SR protein, called AncSR1, 
was an estrogen-binding protein. The strong sequence 
similarities among the other SR genes and proteins 
indicate that the diversification of SRs began when 
AncSR1 underwent a gene-duplication event. This gave 
rise to the two major SR protein clades. In the ER clade, 
the proteins diversified to produce estrogen-binding 
capability in mollusks, annelids, and cephalochordates. 
Later gene-duplication events in this clade gave rise to the 
vertebrate ERa and ERB proteins. In the clade of the other 
SRs, the original duplication of AncSR1 was followed by 
additional gene duplication and diversification to produce 
the four new vertebrate SR proteins (see Section 18.2 for 
additional discussion of genome duplication). 
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Figure 22.14 Evolution of the vertebrate SR gene family. 
Multiple duplications of the ancestral estrogen receptor gene 
AncSR1 was followed by diversification to produce two vertebrate 
estrogen receptors (ERs) and other ERs. Other vertebrate steroid 
receptors (MR, GR, PR, and AR) evolved to use intermediates in 
the estrogen biosynthetic pathway as ligands. 


Ancestral Gene and Protein Reconstruction Phylogenetic 
reconstruction based on comparisons of gene and protein 
sequence similarities is one way to identify the probable 
evolutionary history and function of the ancestral SR 
protein. Over more than a decade of such analysis, Joseph 
Thornton and his colleagues have developed several lines 
of evidence to demonstrate that AncSR1 was an ER and 
have deciphered the process that led to the evolution 
of new SR functions in vertebrates. Part of Thornton’s 
identification of AncSR1 function is based on statistical 
analysis of the phylogenetic information to identify the most 
likely nucleotide at each location in the ancestral gene. The 
researchers used these data to recreate multiple versions of 
the inferred ancestral protein by placing a synthesized DNA 
copy of each putative ancestral gene into an expression 
system capable of transcription and translation. Each 
recreated version of AncSR1 produced slightly different 
results, but all versions functioned as estrogen receptors. 


The Evolution of Novelty Results by Thornton and 
others show that the inferred AncSR1 sequence is highly 
similar to vertebrate ERs in both its LBD and DBD 
domains. This provides additional evidence that the 
function of AncSR1 was estrogen binding and indicates 
that the ancestral protein most likely recognized an 
AGGTCA-containing response element as do contemporary 
ERs. The subsequent diversification of the SR protein 
occurred with a switch of the DBD to recognize AGAACA 
response-element sequences and with changes in the 
LBDs to facilitate binding of the new ligands. 

Changing the LBDs to recognize new hormone li- 
gands may seem to be a more complex evolutionary 
problem than switching DBD recognition. In reality, 
however, this may be a simple and common occur- 
rence. Research indicates that just two nucleotide base 
pair changes are required to switch DBD recognition 
from one response element sequence to the other. 
Furthermore, estrogen is the end product of a multistep 
biochemical pathway that begins with cholesterol and 
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Figure 22.15 The estrogen biosynthesis pathway. 


has progesterone, corticosteroids, and testosterone as 
intermediates (Figure 22.15). The development of new li- 
gand targets for the diversified SR proteins is an example 
of an evolutionary mechanism in which the ancestral 
molecule recognized the end product of a multistep 
biosynthetic pathway, estrogen in this case, and duplica- 
tion and subsequent diversification eventually produced 
proteins that recognize and interact with intermediate 
compounds in the same pathway. Computer simulation 
studies have found that changing just two to four amino 
acids in the LBD of a modern human ER is sufficient to 
change its LBD to one binding progesterone or testos- 
terone. Derived copies of AncSR1 may have undergone 
similar changes in ligand recognition by equally simple 
alterations. 


Human Genetic Diversity and Evolution 


The Human Genome Sequencing Project (HGSP) com- 
pleted in the early 2000s was a significant milestone in 
biology that has paved the way for an unprecedented wave 
of new information about the composition, evolution, 
and genetic diversity of the human genome. As expected 
based on our known evolutionary relationships, many hu- 
man genes were shown to be directly related to those of 
our primate and mammalian relatives. More surprising 
was the identification of many human genes that are ho- 
mologous to distantly related organisms, including bacte- 
ria. An additional surprise was the finding that a sizeable 
percentage of the human genome consists of transposable 
genetic elements, many active, many others inactivated. 

The HGSP was just the beginning, and by the start 
of 2014, entire genome sequences of thousands of indi- 
viduals were available, representing the full array of human 
genetic diversity. In addition, many much older human 
genome samples from Neandertals and the lesser known 
Denisovans have recently been sequenced using DNA from 
bones recovered in ancient human habitation sites. These 
ancient samples date from 12,000 years ago to almost 
45,000 years ago and, collectively, these modern and an- 
cient human genomes provide an unprecedented view of 
human genetic diversity and lead to powerful insights into 
past events that contributed to the evolution of the modern 
human genome. 


SNPs and Indels A sampling of single nucleotide 
polymorphism (SNP) variation between the genomes of 
two randomly chosen individuals today reveals differences 
at about 1 in 1000 DNA base pairs, an approximately 
3 million base-pair difference in the 3 X 10° bases in 


the genome. Variation accumulates over time, and the 
greatest variation is expected in the oldest population. 
The genomes of Africans contain the most variation, 
consistent with Africa being the place where our species 
originated. Mutational studies comparing SNP variation 
in parental genomes with that in their offspring find 
variation accumulates at a rate of about 30 new SNPs in 
each individual’s germ cells in each generation. 

In addition to SNP variation, human genome analysis 
has revealed a high frequency of insertions or deletions, 
called indels, along with small inversions. These indels and 
inversions are similar to those described in Sections 13.3 and 
13.4, but they usually occur in noncoding regions of chro- 
mosomes and they do not cause phenotypic abnormalities. 

Comparisons of the genomes of four donors of 
African ancestry, two Southeast Asian donors, and two 
Caucasian donors revealed 1565 indels and other small 
chromosome variants. These findings suggest that human 
genomes can differ by hundreds to thousands of small 
chromosome structural variants. As with SNP variation, 
African donors possessed much greater genetic diversity 
than did non-Africans. 


Human Genetic History The distribution of polymorphic 
alleles among populations provides insight into the 
evolutionary history of humans as a species (Figure 22.16). 
A number of these are associated with dietary or 
environmental adaptation. We previously discussed SNPs 
that confer a dietary advantage: the SNPs affecting the 
LCT gene that lead to lactase persistence in European 
and African pastoral populations. The selective advantage 
conferred by these SNPs is to allow milk and dairy product 
consumption in adults. 

Another example of polymorphism is the genes that 
determine skin color. Alleles conferring darker skin pig- 
mentation arose in environments with high levels of UV 
irradiation. In such environments, dark skin pigmenta- 
tion helps shield skin from UV damage. In lower UV 
environments, however, dark pigmentation can reduce 
UV penetration and interfere with the synthesis of the 
essential compound vitamin D3. Thus, natural selection 
may have favored alleles that lightened skin pigmentation 
in ancient human populations that migrated to environ- 
ments with low UV irradiation to promote easier vitamin 
D3 production, particularly in Europe and Asia. One par- 
ticular allele of interest in this regard is a mutation of the 
melanocortin-1 receptor (MC1R) gene that is particularly 
common in northern Europe. Specific mutant alleles of 
MCIR are associated with red hair and light skin pig- 
mentation. Contemporary population genetic estimates 
indicate that approximately 40% of people whose ancestry 
is traced to the United Kingdom carry at least one mutant 
MCIR allele. MCIR mutations are found in most human 
populations, but the frequency is usually quite low. One 
hypothesis is that MC1R mutations are at a selective dis- 
advantage in environments with high UV irradiation; but, 


Gujarat 


Malayalam 
Genetic diversity among 


Marawari 


Tamil India 
European, Asian, and Australian 


H Bengali 


arth 
‘nae populations is a subset of that 

Assamese 

— Keshmitt a 7] found in Africa, consistent with 
— kaash Central Asia = these populations being 
—— ae LL derived from an ancestral 
Makeni -n | irr se African population. 
Yi 


Telugu 
annada 


E Sardinian 
Basque L — Japanese 
Lr E Naxi + 
taan cada urope Ta East Asia 
Russian Xibo ieh 
Adygei ‘sell lezhen 
Brae? Orogen 
Parsi a Daur kút 
faku 
Bedouin C="! LL. Middle East Mongolia 
Palestinian pa Cambodian n 
Colombia | 
ct Karitiana aR z 
wui — Americas 


Maya 
p ee Pima | 
— Hazara 
—— Uygur = 
a Melanesian 


TA Papuan Oceania 
Australian = 


Mozabite ae | 


Beja Banuamir 
J Beta Israel (Eastern Africa) [~ Saharan Africa 


Beja Hadandawa = i 
Dogon (Western Africa) 


Cape Mixed Ancestry (Southern Africa) 


Gabra 
Rendille 


Burji 


Lt Wata E , 
Wafiome “0° Genetic distances are 
positively correlated with 


Okiek geography and language, 


Yaaku 
= Samburu M22s4! Mumonyot both globally and within 
Maasai Il'gwesi i 
EI Molo Africa. 


Marakwet 
Sengwer 


Tugen 
Maasai Ilchamus 
Dorobo 


L~~ Maasai (Tanzania) 
Mbugu 


Shilluk 
Co iii 
Nuer 


Turkana 
Nandi 


Iraqw 
Akie 


- Eastern Africa 


Sabaot 
Kikuyu 


Hadza 


Sandawe 
Burunge 


Tutsi/Hutu 


Luo 
Ea ġ§ilinga 
Bantu (Kenya, CEPH) 


Sukuma 


Mandara 


Bulala 
D Kanembou 
Baggara 


Kotoko 
Massa 


Tupuri 


—— Giga 
Ė e a 


M| Fulani (Mbororo) 
d r Fulani (Nigeria) 
Fulani (Cameroon) 


Hausa (Cameroon) 
Mandinka 


North Carolina 
icago 3 ; 
_| — Baltimore 

Koma 


Mbum 


Ngambaye 


Lak. 
= ye 
H Sara (various) 


Gbaya 
igala 


igbe - Western / Central Africa 


|___________=_ Bassange 
Il Yoruba (CEPH) 
Yoruba 


(r 
Hausa (Nigeria) 


| __ Bong 
Ashanti BiB 
ioula 
Yakoma 


Mvae 
Ntumu 
Bulu 
African populations exhibit 


{ Eton 
Ewondo 


Pi Mabea 
e S the most genetic variation, 
Batanga : ‘ 5 i 
C pa consistent with Africa being 


Yambassa 
Lemande the source of all modern 


Banen 
Bafia 
Baluba humans. 


Kongo 
South Tikar 


North Tikar 
Wimbum 
Batie 
Bamoun 
Barega = 


Baka 
Biaka 
— — Evens "i þora - Southern Africa/Pygmies 
uti 


!XunKxoe 


0.01 a ©AAAS/ Sarah A. Tishkoff 


Figure 22.16 Cladogram showing genetic distances and relationships between human 
populations. This phylogenetic tree is based on 1327 polymorphic markers, 848 repetitive sequence 
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in environments where UV irradiation is low, the selective 
disadvantage is no longer present, and the mutant allele 
can increase in frequency. 


What It Means to Be Human While analysis and 
comparisons among modern human genomes provide 
insight into what makes us individuals, biologists and 
geneticists look to our closest primate relatives and to 
our recently extinct human relatives to understand what 
makes us human. 

Humans and chimpanzees last shared a common 
ancestor about 6 million years ago. Both lineages have 
diverged since that time. Many phenotypic and behav- 
ioral differences between humans and chimpanzees are 
obvious, but what about genetic differences? Genetic and 
genomic analysis indicates that about 5% of each genome 
is lineage-specific, that is, found in one lineage exclusively 
but not in the other. Stated another way, the genomes of 
humans and chimpanzees are about 95% identical. There 
are about 30 million SNPs differentiating the human and 
chimpanzee genomes, or roughly 10 times the number 
that differentiate one human from another (the vast ma- 
jority in noncoding regions). In addition, there are about 
5 million indels accounting for an additional portion of 
the difference. Comparing orthologous proteins, 29% of 
human and chimpanzee proteins have identical amino 
acid sequences, and the average protein differs by about 
two amino acids between the two lineages. Beyond these 
differences are gains and losses of genes in each lineage. 
Complete genome tabulations show that the number of 
genes differs by several hundred. And, notably, humans 
and chimpanzees also differ in chromosome number— 
humans have 46 chromosomes, whereas chimpanzees 
have 48 chromosomes. This difference is due to the fusion 
(Robertsonian translocation) of two autosomes carried 
by the common ancestor and by chimpanzees that forms 
human chromosome 2 (see the Case Study in Chapter 13). 

How can biologists determine which changes were 
functionally important in the evolution of humans as they 
diversified from their shared common ancestor with the 
chimpanzee? The first step is to identify those changes 
that occurred exclusively in the human lineage. This is 
done using the genome sequence of a third, more dis- 
tantly related, species, such as Gorilla, for comparisons 
that allow researchers to separate human and chimpan- 
zee alleles into those that are ancestral and those that 
are derived (Figure 22.17). Human alleles are considered 
ancestral if they are shared by humans and gorillas but 
differ in chimps. Conversely, the human allele is consid- 
ered derived if it differs from the allele shared by chimps 
and gorillas. Once uniquely human alleles are identified, 
they can be investigated to determine what, if any, dif- 
ference in phenotype is attributable to the allele. The 
functional and evolutionary significance of identified phe- 
notypic variation can then be investigated. 

A companion approach to exploring important hu- 
man alleles has developed within the past decade with 
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Figure 22.17 Identifying ancestral versus derived alleles. 
the availability of high-quality genome sequences from 
Neandertals and a previously unknown human lineage, 
the Denisovans. Neandertals and Denisovans are descen- 
dants of human lineage that diverged about 400,000 years 
ago from what became the modern human lineage. These 
ancient humans migrated out of Africa and into the 
Middle East and Eurasia where their descendants lived 
until about 30,000 years ago. Modern humans stayed in 
Africa until about 75,000 to 85,000 years ago, when they 
migrated to Eurasia and coexisted with Neandertals and 
interbred (see the Case Study in Chapter 1). 

Since modern humans and Neandertals cohabited 
in Eurasia for millennia, questions have persisted about 
whether interbreeding occurred between the lineages. The 
first comparisons of genome sequences revealed that about 
1% to 3% of the modern human genome is of Neandertal 
origin. Initially, the introgression of Neandertal DNA into 
the modern human genome was thought to be limited 
to non-Africans, but recently Neandertal DNA has been 
identified in the genomes of the Masai who currently 
reside in Kenya and Tanzania. Additional analysis of high- 
quality Neandertal genomic sequence has determined that 
while the average modern human carries a few percent of 
Neandertal DNA in the genome, it is not the same DNA 
in each person. The current estimate is that 10% to 20% of 
the Neandertal genome is present today if all modern hu- 
man genomes are considered collectively. 

This analysis has also revealed that Neandertal 
DNA is unevenly distributed in the human genome. For 
example, there is virtually no Neandertal DNA on the 
X chromosome. Each autosome, on the other hand, car- 
ries Neandertal DNA, with the precise distribution differ- 
ing among human populations (Figure 22.18). Benjamin 
Vernot and Joshua Akey, reported in early 2014 on the 
distribution of Neandertal DNA in the human genome. 
They speculated that the absence of Neandertal DNA 
from the human X chromosome indicates that the descen- 
dants of human—Neandertal hybrids bearing Neandertal 
X chromosome DNA became less fertile over time and 
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Figure 22.18 The distribution of Neandertal DNA in the modern human genome. The distribu- 
tion of Neandertal DNA in European and East Asian genomes. Neandertal DNA has been detected in 
all 22 human autosomes. Chromosome 7 shown here is typical. Neandertal DNA is European genomes 
is indicated in the blue and the East Asian genomes is indicated in red. The genome region in the gray 


region contains data insufficient to identify its origin. 


the Neandertal DNA was eventually lost. This may have 
been particularly the case for males, whose X chromo- 
some genes are present in a single copy. Without a sec- 
ond copy of a gene to compensate, the less-fit X-linked 
Neandertal alleles were selected against, so that eventually 
most of them were lost from the human X chromosome. 
Notwithstanding the fate of Neandertal genes on 
the X chromosome, two independent research groups 
reported in 2013 and 2014 on the identification of a 
Neandertal version of a gene that affects the skin, hair, 
and nail protein keratin that is found in about 60% of the 
genomes of Europeans and East Asians. Keratin thickens 
and toughens the skin, giving it elasticity and protection 
against water, heat, cold, and pathogens. The researchers 
speculate that the additional protection afforded to the 
skin was advantageous in the colder and wetter climates 
of Europe and Asia. A Neandertal gene that affects the 
size of the optic disc in the eye has also been identi- 
fied. So have genes influencing several human disorders. 
Neandertal genes that make humans more susceptible 
to Crohn’s disease, lupus, and type 2 diabetes and even 


CASE STUDY 


a gene influencing potential addiction to smoking have 
been identified. Researchers speculate that these alleles 
did not harm Neandertals or the humans who carried the 
Neandertal alleles, at least until very recently. It may be 
that prior to the last century or so, the average human life 
span was not long enough for the effects of these alleles to 
manifest themselves, or they may have provided some as 
yet unidentified advantage. 

Several questions regarding the evolution of the 
modern human genome remain to be answered: Did 
the Denisovans contribute any genes or DNA sequences 
to the human genome? Was there another ancient hu- 
man lineage that contributed to the makeup of the 
human genome? What does interpopulation patterning 
of human genetic variation tell us about the migration 
of modern humans out of Africa and their subsequent 
spread around the globe? There is much more to be told 
in the story of what in our genetic and evolutionary his- 
tory makes us human, and undoubtedly some surprises 
will be revealed, as researchers study the evolution of the 
human genome. 


CODIS—Using Population Genetics to Solve Crime and Identify Paternity 


Each of us, with the exception of monozygotic multiple births, 
has a unique genome. Moreover, given the amount of genetic 
variability uncovered by human genome analysis, each of us 
may be genetically different from any other person who has 
ever lived. The concept of genetic uniqueness has practical 
applications in individual identification through the identi- 
fication of genotypes of DNA marker genes. This analysis is 
commonly known as DNA fingerprinting, or DNA profiling. 
Individual identification uses laboratory analyses of selected 
genetic markers in combination with statistical analysis based 
on the H-W equilibrium to determine the probability that a 
particular individual is the source of a specific DNA sample. 
The genetic markers used in DNA profiling are variable 
number tandem repeats (VNTRs) that contain different num- 
bers of copies of short, repeating DNA sequences. The DNA 
repeats for a particular VNTR marker range in length from 2 
base pairs (bp) to about 20. VNTR analysis is carried out using 
polymerase chain reaction (PCR) amplification of targeted 


DNA sequences followed by gel electrophoresis. The DNA 
fragment-length variation observed for VNTRs is essentially 
identical to that seen for RFLPs (see Section 10.2). These meth- 
ods are highly automated, and the results are highly reproduc- 
ible and reliable in the hands of trained laboratory technicians. 

In 1997, the U.S. Federal Bureau of Investigation (FBI) 
selected 13 independently assorting human STRP markers to 
form the core of the bureau’s Combined DNA Index System 
(CODIS; Figure 22.19). Extensive analysis determined the 
number and frequencies of alleles for the 13 original CODIS 
VNTRs in most human populations. Studies also precisely 
defined laboratory methods for the analysis of CODIS VNTRs. 
More VNTRs were added to the original CODIS markers in later 
years, and more than 20 CODIS markers are in use today. 

The statistical power of CODIS-based identification rests 
on the Hardy-Weinberg equilibrium and the product rule 
of probability for independently assorting genes. The VNTR 
allele frequencies in each population are used to predict 
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Figure 22.19 The original 13 independently assorting 
CODIS VNTR genetic markers. 


population genotype frequencies. Table 22.8 lists the fre- 
quencies of alleles for three example VNTR loci, D3S1358, 
VWA, and FGA. Using these frequencies, let’s determine the 
probability that a person selected at random from the popu- 
lation is homozygous for the 14 allele of D3S1358 (i.e., has 
the 14/14 genotype), is heterozygous 15/19 for VWA, and is 
heterozygous 20/25 for FGA. The frequency of the 14 allele 
of D3S1358 is f(14) = 0.134 and the homozygous frequency 
is (14/14) = (0.134)(0.134) = 0.018 (1.8%). The genotype fre- 
quency for VWA, f(15/19), is 2[(0.119)(0.088)] = 0.0209 (2.09%) 


THO1 VWA 


and for FGA, f(20/25), is 2[(0.125)(0.094)] = 0.0235 (2.35%). 
Based on independent assortment of the three markers, the 
joint probability of the genotypes is determined by the prod- 
uct rule: (0.0180)(0.0209)(0.0235) = 8.84 X 107°, or approxi- 
mately 1 in 8.84 million. Most often, all 13 CODIS markers 
are used, and researchers estimate that the likelihood of two 
unrelated people having the same genotype is very small. 
According to some estimates, the theoretical probability of 
a random match of two unrelated people is about 107'°, or 
about one in a quadrillion! 

CODIS markers have been used in countless criminal 
and paternity cases since 1997. In criminal cases where the 
DNA fingerprint of a suspect is compared to a sample from 
a crime scene, the genotypes for all the markers are com- 
pared to determine if any mismatches exist—that is, to see 
whether the suspect carries an allele not found in the crime 
scene sample, or vice versa. The detection of such a mismatch 
results in the exclusion of the suspect as the source of the ge- 
netic material from the crime scene. Figure 22.20 shows an 
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Figure 22.20 DNA marker analysis. CODIS VNTR data 
produced by PCR analysis of three loci, FGA, VWA, and D3S1358. 
Suspects 1 and 3 are excluded as the source of crime scene 
DNA by mismatches at each gene. Suspect 2 is not excluded by 
this analysis. 


Table 22.8 Allele Frequencies for Three VNTR Loci Used in CODIS 


D3S1358 VWA FGA 
Allele Frequency Allele Frequency Allele Frequency 
12 0.015 12 0.015 18 0.015 
13 0.015 14 0.131 19 0.061 
14 0.134 15 0.119 20 0.125 
15 0.270 16 0.186 21 0.180 
16 0.229 17 0.257 22 0.209 
17 0.162 18 0.189 23 0.131 
18 0.162 19 0.088 24 0.146 
19 0.015 20 0.015 25 0.094 
26 0.018 


27 0.015 


example analysis of the genes D351358, VWA, and FGA from 
a crime scene sample, the crime victim, and three suspects. 
Mismatches between the crime scene sample and Suspects 
1 and 3 exclude these two suspects as sources of the crime 
scene sample. On the other hand, Suspect 2 is not excluded 


SUMMARY 


22.1 The Hardy-Weinberg Equilibrium Describes 
the Relationship of Allele and Genotype 
Frequencies in Populations 


A population is a group of interbreeding organisms that 
share a collection of genes known as a gene pool. 

If there are two alleles at a locus, with frequencies repre- 
sented by p and q, the sum of allele frequencies p + q = 1.0. 
The Hardy-Weinberg equilibrium predicts that two alleles 
will be distributed into genotypes of frequencies p°, 2pq, and 
q”. The sum of genotype frequencies p? + 2pq + q* = 1.0. 
The Hardy-Weinberg equilibrium assumes that the mem- 
bers of a population mate at random and that the population 
is not altered by any of the four evolutionary processes. 
Allele frequencies in populations can be determined by the 
genotype proportion method, allele counting, or the square 
root method. 

The Hardy-Weinberg equilibrium can be used even when 
more than two alleles occur for a gene. 

Chi-square analysis compares the number of observed geno- 
types to the number expected under assumptions of the 
Hardy-Weinberg equilibrium. 


22.2 Natural Selection Operates through Differential 
Reproductive Fitness within a Population 


| 


Relative fitness is the comparative capacity of individuals 
with different phenotypes to make genetic contributions 
to the next generation due to influence of natural 
selection. 


A selection coefficient is the percentage decrease in repro- 
ductive success experienced by an organism possessing a 
relative fitness that is less than 1.0. 


Directional selection drives the frequency of the favored 
allele toward fixation in the population and the disfavored 
allele toward elimination. 

Balanced polymorphism is a stable allele frequency equilibrium 
resulting from natural selection that favors heterozygotes. 
Convergent evolution leads to similar or identical phenotypes 
in populations that have separate evolutionary histories. 


22.3 Mutation Diversifies Gene Pools 


Forward and reverse mutation slowly change the frequencies 
of alleles in populations. 

Deleterious mutations are removed by natural selection, 
striking an equilibrium frequency by balancing mutation and 
selection rates. 
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on the basis of these three genes. Additional CODIS genes 
can be analyzed, and if they do not exclude Suspect 2 as the 
source of the crime scene sample, the probability that another 
person is the source of the crime scene genotype can then be 
calculated using the methods described. 


For activities, animations, and review quizzes, go to the study area. 


22.4 Migration Is Movement of Organisms 
and Genes between Populations 


E Gene flow is the transfer of alleles by the migration of indi- 
viduals between populations. 

E Gene flow can produce new allele frequencies in admixed 
populations. 

E Gene flow homogenizes allele frequency differences among 
populations that exchange members. 


22.5 Genetic Drift Causes Allele Frequency 
Change by Sampling Error 


Genetic drift is the random fluctuation of allele frequencies 
caused by errors in sampling. 
Genetic drift leads ultimately to allele fixation and 
elimination, but small populations are particularly 
susceptible to its effects. 

| Founder effect is a special form of genetic drift that occurs 
when a small number of individuals from a larger population 
establish a new, small population. 
A genetic bottleneck is a random and substantial reduc- 
tion in population size that significantly changes allele 
frequencies among survivors. 


22.6 Inbreeding Alters Genotype Frequencies 


E Inbreeding is nonrandom mating based on genotype that 
occurs between relatives who are more closely related to one 
another than to a random member of the population. 

The coefficient of inbreeding (F) is the probability that 
an allele is homozygous identical by descent in an inbred 
individual. 

E Inbreeding increases the frequency of homozygosity and 
decreases heterozygosity. 

E Inbreeding depression often develops in inbred populations 
due to the cumulative effects of numerous homozygous loci. 


22.7 Species and Higher Taxonomic Groups 
Evolve by the Interplay of Four Evolutionary 
Processes 


| New species emerge in reproductive isolation through 
adaptive change in response to conditions. 

| Prezygotic reproductive isolation prevents mating between 
individuals in different populations. Postzygotic reproductive 
isolation reduces the ability of individuals from different popu- 
lations to produce living and fertile offspring when they mate. 
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E In allopatric speciation, new species develop as a result of the E In keeping with the relatively short evolutionary span of 
physical separation of populations into different geographic modern humans, the level of genetic diversity in the modern 
areas. human genome is modest in comparison to genetic diversity 

E Sympatric speciation results from genetic differences that in other genomes. 

prevent reproduction among organisms that occupy the I The modern human genome has undergone introgression 

same habitat. of genes and DNA sequences from Neandertals as the result 

of interbreeding in regions of Eurasia where the species 


22.8 Molecular Evolution Changes Genes co-existed. 


and Genomes through Time 


E The evolution of gene families often occurs by one or more 
duplications of an ancestral gene, followed by diversification 


of sequence and function of the new gene copies. 


KEYWORDS 


admixed population (p. 755) 
allele-counting method (p. 747) 
allopatric speciation (p. 761) 
anagenesis (p. 761) 

balanced polymorphism (p. 751) 
cladogenesis (p. 761) 

coefficient of inbreeding (F) (p. 758) 
convergent evolution (p. 752) 
differential reproduction (p. 748) 
directional natural selection (p. 750) 
forward mutation rate (1) (p. 753) 
founder effect (p. 756) 

gene pool (p. 743) 

genetic bottleneck (p. 757) 


genetic drift (p. 756) 
genetic hitchhiking (p. 753) 
genotype proportion method 
(p. 747) 
Hardy-Weinberg (H-W) equilibrium 
(p. 743) 
identical by descent (IBD) (p. 758) 
inbreeding (consanguineous mating) 
(p. 758) 
inbreeding depression (p. 759) 
indels (p. 766) 
island model (p. 755) 
migration (gene flow) (p. 755) 
mutation (p. 753) 


mutation—selection balance 
(p. 753) 
population (p. 743) 
population genetics (p. 743) 
postzygotic mechanism (p. 761) 
prezygotic mechanism (p. 761) 
relative fitness (w) (p. 748) 
reproductive isolation (p. 760) 
reverse mutation rate (v; reversion rate) 
(p. 753) 
selection coefficient (s; £) 
(p. 749) 
square root method (p. 747) 
sympatric speciation (p. 763) 


PROBLEMS ( MasteringGenetics™ Visit for instructor-assigned tutorials and problems. 

Chapter Concepts For answers to selected even-numbered problems, see Appendix: Answers. 
1. Compare and contrast the terms in each of the following polymorphism in a population. Do not include the 

pairs: processes described in the answer to Problem 4. 

a. population and gene pool : 6. Genetic drift, an evolutionary factor affecting all popula- 

b. random mating and inbreeding , tions, can have a significant effect in small populations, 

c. natural selection and genetic drift even though its effect is negligible in large populations. 

d. a polymorphic trait and a polymorphic gene Explain why this is the case. 

e. founder effect and genetic bottleneck 


2. Ina population, what is the consequence of inbreeding? 
Does inbreeding change allele frequencies? What is the 
effect of inbreeding with regard to rare recessive alleles 
in a population? 


3. Identify and describe the evolutionary forces that can cause 
allele frequencies to change from one generation to the next. 


4. Describe how natural selection can produce balanced 


polymorphism of allele frequencies through selection that 


favors heterozygotes. 


5. Thinking creatively about evolutionary mechanisms, 
identify at least two schemes that could generate allelic 


7. Over the course of many generations in a small popula- 
tion, what effect does random genetic drift have on allele 
frequencies? 


8. Catastrophic events such as loss of habitat, famine, or 
overhunting can push species to the brink of extinction 
and result in a genetic bottleneck. What happens to allele 
frequencies in a species that experiences a near-extinction 
event, and what is expected to happen to allele frequencies 
if the species recovers from near extinction? 


9. George Udny Yule was wrong in suggesting that an auto- 
somal dominant trait like brachydactyly will increase in 
frequency in populations. Explain why Yule was incorrect. 


10. 


11. 


12. 


13. 


The ability to taste the bitter compound phenylthiocarba- 
mide (PTC) is an autosomal dominant trait. The inability 
to taste PTC is a recessive condition. In a sample of 500 
people, 360 have the ability to taste PTC and 140 do not. 
Calculate the frequency of 


a. the recessive allele 
b. the dominant allele 
c. each genotype 


Figure 22.6 (page 751) illustrates the effect of an ethanol-rich 
and an ethanol-free environment on the frequency of the 
Drosophila Adh” allele in four populations in a 50-generation 
laboratory experiment. Population 1 and population 2 were 
reared for 50 generations in a high-ethanol environment, 
while control 1 and control 2 populations were reared for 

50 generations in a zero-ethanol environment. Describe the 
effect of each environment on the populations, and state any 
conclusions you can reach about the role of any of the evolu- 
tionary processes in producing these effects. 


Biologists have proposed that the use of antibiotics to treat 
human infectious disease has played a role in the evolu- 
tion of widespread antibiotic resistance in several bacterial 
species, including Staphylococcus aureus and the bacteria 
causing gonorrhea, tuberculosis, and other infectious 
diseases. Explain how the evolutionary mechanisms muta- 
tion and natural selection may have contributed to the 
development of antibiotic resistance. 


Two populations of deer, one large one living in a main- 
land forest and a small one inhabiting a forest on an island, 


Application and Integration 


17. 


18. 


19. 


20. 


Genetic Analysis 22.1 (page 749) predicts the number of 
individuals expected to have the blood group genotypes 
MM, MN, and NN. Perform a chi-square analysis using 
the number of people observed and expected in each 
blood-type category, and state whether the sample is in 
Hardy-Weinberg equilibrium (see pages 50 and 51 for the 
chi-square formula and table). 


In a population of rabbits, f(C;) = 0.70 and f(C2) = 0.30. 
The alleles exhibit an incomplete dominance relationship 
in which C,C, produces black rabbits, CC, tan-colored 
rabbits, and CC) rabbits with white fur. If the assumptions 
of the Hardy-Weinberg principle apply to the rabbit popu- 
lation, what are the expected frequencies of black, tan, and 
white rabbits? 


Sickle cell disease (SCD) is found in numerous populations 
whose ancestral homes are in the malaria belt of Africa and 
Asia. SCD is an autosomal recessive disorder that results 
from homozygosity for a mutant B-globin gene allele. Data 
on one affected population indicates that approximately 

8 in 100 newborn infants have SCD. 


a. What are the frequencies of the wild-type (64) and 
mutant (8°) alleles in this population? 
b. What is the frequency of carriers of SCD in the population? 


Epidemiologic data on the population in the previous 
problem reveal that before the application of modern 
medical treatment, natural selection played a major role in 
shaping the frequencies of alleles. Heterozygous individu- 
als have the highest relative fitness, and in comparison to 


14. 


15. 


16. 
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regularly exchange members who migrate across a land 

bridge that connects the island to the mainland. 

a. If you compared the allele frequencies in the two popu- 
lations, what would you expect to find? 

b. An earthquake destroys the bridge between the island 
and the mainland, making migration impossible for 
the deer. What do you expect will happen to allele 
frequencies in the two populations over the following 
10 generations? 

c. In which population do you expect to see the greatest 
allele frequency change? Why? 


Directional selection presents an apparent paradox. By fa- 
voring one allele and disfavoring others, directional selection 
can lead to fixation (a frequency of 1.0) of the favored allele, 
after which there is no genetic variation at the locus, and its 
evolution stops. Explain why directional selection no lon- 
ger operates in populations after the favored allele reaches 
fixation. 


What is inbreeding depression? Why is inbreeding depres- 
sion a serious concern for animal biologists involved in 
species-conservation breeding programs? 


Certain animal species, such as the black-footed ferret, are 
nearly extinct and currently exist only in captive popula- 
tions. Other species, such as the panda, are also threatened 
but exist in the wild thanks to intensive captive-breeding 
programs. What strategies would you suggest in the case of 
black-footed ferrets and in the case of pandas to monitor 
and minimize inbreeding depression? 


For answers to selected even-numbered problems, see Appendix: Answers. 


21. 


22. 


heterozygotes, those who are 644 have a relative fitness 
of 82 percent, but only about 32 percent of those with SCD 
survived to reproduce. What are the estimated equilibrium 
frequencies of £^ and f° in this population? 


The frequency of tasters and nontasters of PTC (see 
Problem 10) varies among populations. In population A, 
64 percent of people are tasters (an autosomal dominant 
trait) and 36 percent are nontasters. In population B, tast- 
ers are 75 percent and nontasters 25 percent. In population 
C, tasters are 91 percent and nontasters are 9 percent. 


a. Calculate the frequency of the dominant (T) allele for 
PTC tasting and the recessive (t) allele for nontasting in 
each population. 

b. Assuming that Hardy-Weinberg conditions apply, deter- 
mine the genotype frequencies in each population. 


Tay-Sachs disease is an autosomal recessive neurological 
disorder that is fatal in infancy. Despite its invariably lethal 
effect, Tay-Sachs disease occurs at very high frequency in 
some Central and Eastern European (Ashkenazi) Jewish 
populations. In certain Ashkenazi populations, 1 in 750 
infants has Tay-Sachs disease. Population biologists believe 
the high frequency is a consequence of genetic bottlenecks 
caused by pogroms (genocide) that have reduced the popu- 
lation multiple times in the last several hundred years. 
a. What is a genetic bottleneck? 
b. Explain how a genetic bottleneck and its aftermath 
could result in a population that carries a lethal allele in 
high frequency. 
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23. 


24. 


25. 


26. 


27. 
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c. Inthe population described, what is the frequency of 
the recessive allele that produces Tay-Sachs disease? 

d. Assuming mating occurs at random in this population, 
what is the probability a couple are both carriers of 
Tay-Sachs disease? 


Cystic fibrosis (CF) is the most common autosomal reces- 
sive disorder in certain Caucasian populations. In some 
populations, approximately 1 in 2000 children have CF. 
Determine the frequency of CF carriers in this population. 


In the mouse, Mus musculus, survival in agricultural fields 
that are regularly sprayed with a herbicide is determined 
by the genotype for a detoxification enzyme encoded by a 
gene with two alleles, F and S. The relative fitness values 
for the genotypes are: 


Genotype Relative fitness 


FF 0.72 
ES) 1.00 
SS 0.45 


a. Why will this pattern of natural selection result in a 
stable equilibrium of frequencies of F and S? 
b. Calculate the equilibrium frequencies of the alleles. 


In a population of flowers growing in a meadow, C; and C3 


are autosomal codominant alleles that control flower color. 


The alleles are polymorphic in the population, with 

Ff (Cy) = 0.80 and f(C2) = 0.20. Flowers that are C)C; are 

yellow, orange flowers are C;C», and C2C> flowers are red. 

A storm blows a new species of hungry insects into the 

meadow, and they begin to eat yellow and orange flowers 

but not red flowers. The predation exerts strong natural 
selection on the flower population, resulting in relative fit- 

ness values of C,;C, = 0.30, CC = 0.60, and C2C2 = 1.0. 

a. Assuming the population begins in H-W equilibrium, 
what are the allele frequencies after one generation of 
natural selection? 

b. Assuming random mating takes place among survivors, 
what are the genotype frequencies in the second 
generation? 

c. If predation continues, what are the allele frequencies 
when the second generation mates? 

d. What are the equilibrium frequencies of C; and C3 if 
predation continues? 


Assume that the flower population described in the previ- 

ous problem undergoes a different pattern of predation. 

Flower color determination and the starting frequencies 

of C; and C2 are as described above, but the new insects 

attack yellow and red flowers, not orange flowers. As a 

result of the predation pattern, the relative fitness values 

are CC; = 0.40, CC; = 1.0, and CC, = 0.80. 

a. What are the allele frequencies after one generation of 
natural selection? 

b. What are the genotype frequencies among the progeny 
of predation survivors? 

c. What are the equilibrium allele frequencies in the 
predation environment? 


ABO blood type is examined in a Taiwanese population, 
and allele frequencies are determined. In the popula- 
tion, f(4) = 0.30, f(IP) = 0.15, and f(i) = 0.55. Assuming 


28. 


29. 


30. 


31. 


32. 


33. 


Hardy-Weinberg conditions apply, what are the frequen- 
cies of genotypes, and what are the blood group frequen- 
cies in this population? 


A total of 1000 members of a Central American population 
are typed for the ABO blood group. In the sample, 421 have 
blood type A, 168 have blood type B, 336 have blood type O, 
and 75 have blood type AB. Use this information to deter- 
mine the frequency of ABO blood group alleles in the sample. 


A sample of 500 field mice contains 225 individuals that 
are D;D;, 175 that are D;D», and 100 that are D3D9. 


a. 


b. 


What are the frequencies of D; and D; in this sample? 
Is this population in Hardy-Weinberg equilibrium? Use 
the chi-square test to justify your answer. 

Is inbreeding a possible genetic explanation for the 
observed distribution of genotypes? Why or why not? 


In humans the presence of chin and cheek dimples is 
dominant to the absence of dimples, and the ability to taste 
the compound PTC is dominant to the inability to taste 
the compound. Both traits are autosomal, and they are 
unlinked. The frequencies of alleles for dimples are 

D = 0.62 and d = 0.38. For tasting, the allele frequencies 
are T = 0.76 and t = 0.24. 


a. 


b. 


Determine the frequency of genotypes for each gene 
and the frequency of each phenotype. 

What are the expected frequencies of the four possible 
phenotype combinations: dimpled tasters, undimpled 
tasters, dimpled nontasters, and undimpled nontasters? 


Albinism, an autosomal recessive trait characterized by an 
absence of skin pigmentation, is found in 1 in 4000 people 
in populations at equilibrium. Brachydactyly, an autosomal 
dominant trait producing shortened fingers and toes, is 
found in 1 in 6000 people in populations at equilibrium. 
For each of these traits, calculate the frequency of 


a. 


b 
Gi 
d 


the recessive allele at the locus 


. the dominant allele at the locus 


heterozygotes in the population 


. For albinism only, what is the frequency of mating 


between heterozygotes? 


Using the population data in Table 22.8, 


a. 


Calculate the population frequency of individuals with 
the 16/18 genotype at D3S1358, the 14/18 genotype at 
VWA, and the 23/26 genotype at FGA. 

Explain how the Hardy-Weinberg principles are used in 
the analysis of CODIS genotypes and other STRP-locus 
comparisons. 


Evaluate the following pedigree, and answer the questions 
below for individual IV-1. 


1 A273 
| © 
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a. Is IV-1 an inbred individual? If so, who is/are the 36. Draw a separate pedigree, identifying the inbred individu- 
common ancestor(s)? als and the inbreeding pathways, for each of the following 
b. What is F for this individual? inbreeding coefficients: 

i ; . a. F = 4(1/2)° 
34. Evaluate the following pedigree, and answer the questions b. F= 2(1/2)5 
ee c. F=4(1/2)8 
1 2 d. F = 2(1/2)’ 

O 37. The human melanocortin 1 receptor (MCIR) gene plays 

a major role in producing eumelanin, a black-brown 
1 2 3 4 pigment that helps determine hair color and skin color. 


Jonathan Rees and several colleagues (J. L. Rees et al., 
Am. J. Human Genet. 66(2000): 1351—1361) studied mul- 
1 3 3 4 tiple MC1R alleles in African and European populations. 
Ill O Although this research found several MCIR alleles in 
African populations, MCI1R alleles that decrease the pro- 
duction of eumelanin were rare. In contrast, several alleles 
IV @: 2 decreasing eumelanin production were found in European 
populations. How can these results be explained by natu- 


ral selection? 
1 2 3 a n 
v 38. Achromatopsia is a rare autosomal recessive form 


of complete color blindness that affects about 1 in 


a. Which individual(s) in this family is/are inbred? 20,000 people in most populations. People with this 

b. Who is/are the common ancestor(s) of the inbred disorder see only in black and white and have extreme 
individual(s)? sensitivity to light and poor visual acuity. On Pingelap 

Island, one of a cluster of coral atoll islands in the 

Federated States of Micronesia, approximately 

10 percent of the 3000 indigenous Pingelapese inhabit- 

ants have achromatopsia. 

Achromatopsia was first recorded on Pingelap in 
the mid-1800s, about four generations after a typhoon 
devastated Pingelap and reduced the island population to 
about 20 people. All Pingelapese with achromatopsia 
trace their ancestry to one male who was one of the 
l O O George Ill 20 typhoon survivors. Provide a genetic explanation for 
the origin of achromatopsia on Pingelap, and explain the 
I O O most likely evolutionary model for the high frequency 

Plvard there of achromatopsia. 


m 39. New allopolyploid plant species can arise by hybridiza- 
Victoria tion between two species. If hybridization occurs between 
a diploid plant species with 2n = 14 and a second diploid 


IV O species with 21 = 22, the new allopolyploid would have 36 


c. Calculate F for any inbred members of this family. 


35. The following is a partial pedigree of the British royal family. 
The family contains several inbred individuals and a number 
of inbreeding pathways. Carefully evaluate the pedigree, and 
identify the pathways and common ancestors that produce 
inbred individuals A (Alice in generation IV), B (George VI 
in generation VI), and C (Charles in generation VIII). 


Albert 


chromosomes. 
v M] a. Is it likely that sexual reproduction between the 
— aan allopolyploid species and either of its diploid ancestors 
George V Victoria 3 A 
Mary of Teck would yield fertile progeny? Why or why not? 
Vi C) O B b. What type of isolation mechanism is most likely to 
George VI prevent hybridization between the allopolyploid and 


the diploid species? 
Philip ©) Elizabeth II c. What pattern of speciation is illustrated by the develop- 


ment of the allopolyploid species? 
vil 


Diana |Charles Anne Andrew Edward 


VII 
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Appendix: Answers 


Chapter 1 


2. Protein, not DNA, was the focus of efforts to understand heredity 
before this discovery. Only after this discovery was serious attention 
turned to understanding DNA structure and function. Whereas the 
complexity of protein structure confounded thinking about mecha- 
nisms of inheritance, DNA structure provided profound insight into 
the mechanism of heredity, suggesting a simple, elegant mechanism for 
duplication (inheritance), change (mutation and evolution), and pheno- 
type specification (coding). The finding that DNA is universal facilitated 
rapid progress because study results from all organisms were now di- 
rectly related. This also fostered the development of recombinant DNA 
technologies in bacteria and bacteriophage, which led to the explosion 
of biological information that excites and confounds us today. 


4. Evolution states that all life descended from a common ancestor, 
which passed its genes and the mechanisms by which those genes were 
used to its descendants. These mechanisms would include the struc- 
ture of nucleotides, the structure of DNA, the enzymes that replicate 
and read DNA, the enzymes that translate mRNA into amino acid 
sequences, and many more. Any change to one of these components 
would be harmful, slowing or preventing reproduction, and would be 
removed by natural selection (mountains of experimental evidence 
demonstrate that mutations in genes encoding basic genetic machinery 
are lethal). Thus, once established as the genetic material, DNA would 
be maintained as the genetic material by natural selection and therefore 
would be expected to be found as the genetic material in all existing 
organisms. 


6. Genotype refers to the genetic makeup of a cell or organism, whereas 
phenotype refers to observable characteristics of the cell or organism, 
such as appearance, physiology and behavior. The genotype of an organ- 
ism is part of what determines the phenotype of the organism; however, 
environment also plays a role in the organism’s phenotype. The geno- 
type is heritable, and, therefore, its contribution to phenotype will be 
inherited. The aspects of phenotype that are due to environment are 
not heritable. 


8. The modern synthesis of evolution is the reconciliation of Darwin's 
evolutionary theory with the findings of modern genetics. Darwin's 
theory proposed that all evolution was adaptive. Genetic studies on 
mutation and genetic recombination initially argued against the im- 
portance of natural selection as an agent for change because mutation 
and recombination were nonadaptive. The modern synthesis stated 
that evolution is due to the combined action of adaptive and nonadap- 
tive evolutionary forces. In particular, the modern synthesis explained 
how mutation and recombination could provide the raw material (new 
genotypes and phenotypes) on which natural selection acts. 


10a. Transcription is the synthesis of RNA by RNA polymerase. The 
RNA is complementary to the strand of DNA that was used as the 
template for transcription. 

10b. An allele is a specific form of a gene or genetic locus. 

10c. The central dogma of biology originally stated that genetic infor- 
mation flows from DNA to RNA (by transcription) and from RNA to 
protein (by translation). A point of emphasis of this dogma was that 
information does not flow in the reverse direction and has been modi- 
fied to account for reverse transcription. 

10d. Translation is the synthesis of a polypeptide using the informa- 
tion in an mRNA. Translation and protein synthesis are synonyms. 
10e. DNA replication is the process by which DNA is copied by DNA 
polymerase. 


10f. A gene is a segment of DNA that contains all the information nec- 
essary for its proper transcription, including the promoter, transcribed 
region, and termination signals. 


10g. A chromosome is a heritable molecule composed of DNA and 
protein that typically contains genes. 


10h. The term antiparallel refers to the orientation of the two strands of 
nucleic acid in a double-stranded nucleic acid (RNA or DNA). Each end 
of the double-stranded nucleic acid will contain the 5’ end of one strand 
and the 3’ end of the other. 


10i. Phenotype refers to the observable characteristics of an organism, 
which include morphology, physiology, and molecular composition. 
The phenotype of an organism is a product of the interaction between 
its genotype and its environment. 
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10j. Complementary base pair refers to the two nucleotides on op- 
posite, antiparallel strands of a double-stranded nucleic acid, which are 
hydrogen bonded to each other. One nucleotide contains a purine base 
that makes hydrogen bond to the pyrimidine base that is part of the 
other nucleotide. 


10k. Nucleic acid strand polarity refers to the orientation of the 
nucleotides along a single strand of nucleic acid. One end of the 
strand terminates at the 3’ hydroxyl group of a ribose (or deoxyribose) 
sugar, whereas the other strand terminates at the 5’ phosphate group 
on the sugar. These are commonly referred to as the 3’ and 5’ ends of 
the strand. 


101. Genotype refers to the genetic makeup of an organism. The geno- 
type can refer either to the organism’s entire genetic makeup or to the 
genetic information at only one or a few loci. 


10m. Natural selection is the process by which populations and species 
evolve and diverge through differential rates of survival and reproduc- 
tion of members that are due to their inherited differences. 


10n. Mutation is the process that generates new genetic variety 
through change to existing alleles. 


100. Modern synthesis of evolution is the term applied to the reconcili- 
ation of modern genetic analysis with Darwin’s theory of evolution by 
natural selection. 


12. The template DNA strand is complementary and antiparallel to 
the RNA transcript. The coding strand is parallel and identical to the 
mRNA transcript, except that thymidine is in place of uridine. 

14. The 5’ end is a phosphate group. The 3’ end is a hydroxyl group. 

A phosphodiester bond is a covalent bond that joins nucleotides in a 
strand of nucleic acid. 

16. The central dogma is a description of the flow of genetic informa- 
tion. The flow is unidirectional, from DNA to RNA to protein. The 
process by which information flows from DNA to RNA is called tran- 
scription. Transcription is the synthesis of RNA by RNA polymerase. 
The RNA is complementary to the strand of DNA that was used as 
the template for transcription. The flow of information from RNA to 
protein is called translation. Translation is the synthesis of a polypep- 
tide using the information in an mRNA. 

18a. The mRNA sequence is 5’- UUCCAUGUC- 3’. 

18b. The amino acid sequence is Phe-His-Val. 

20a. 6 clades 

20b. A backbone (They are all vertebrates.) 


20c. The mammalian and human clades share these characteristics: 
have backbones and four legs, have fur and produce milk. 


A-1 


A-2 APPENDIX: ANSWERS 


22. The samples are (1) double-stranded DNA, (2) single-stranded 
RNA, (3) double-stranded RNA, and (4) single-stranded DNA. 


Evaluate: DNA will contain thymine (T) but not uracil (U), whereas 
RNA will contain U but not T. Double-stranded DNA will contain equal 
proportions of adenine (A) and T and equal proportions of guanine 
(G) and cytosine (C), whereas single-stranded DNA typically does not. 
Double-stranded RNA will contain equal proportions of A and U and 
equal proportions of G and C, whereas single-stranded RNA typically 
does not. Deduce: Sample 1 contains T but not U, and the percent- 
age of A is the same as T; therefore, it is probably double-stranded 
DNA. Sample 2 contains U but not T, and the percentage of A is not 
the same as U; therefore, it is single-stranded RNA. Sample 3 contains 
U but not T, and the percentage of A is the same as U; therefore, it is 
probably double-stranded RNA. Sample 4 contains T but not U, and 
the percentage of A is not the same as T; therefore, it is single-stranded 
DNA. Solve: Samples 1 and 3 are double-stranded, whereas samples 
2 and 4 are single-stranded. 


24. Mammals 


Chapter 2 


2. The genotype ratio will be 1/2 BB and 1/2 Bb. The phenotype will be 
all B. 


8a. False. The expected phenotype ratio is 9/16 : 3/16 : 3/16: 1/16, as- 
suming simple dominance and independent assortment of genetic loci 
(AaBb X AaBb). There are nine different genotypes, and the genotype 
ratio is 1/16 (AABB) : 2/16 (AABb) : 1/16 (AAbb) : 2/16 (AaBB) : 4/16 
(AaBb) : 2/16 (Aabb) : 1/16 (aaBB) : 2/16 (aaBb) : 1/16 (aabb). 

8b. True 

8c. True 


8d. False. The law of independent assortment is of primary importance 
in predicting the outcome of dihybrid and trihybrid crosses. The law of 
segregation is also necessary but not sufficient. 


8e. False. Reciprocal crosses that produce identical results indicate that 
the traits being studied are autosomal. 


8f. False. The law of segregation predicts that she will produce two 
gamete genotypes with respect to her albinism gene at equal frequency. 


8g. True 
8h(1). True 


8h(2). False. There will be 1/16 AABB, 1/16 AAbb, 1/16 aaBB, and 1/16 
aabb. All four genotypes will be true-breeding; therefore, 1/4 of the 
progeny will be true-breeding. 


8h(3). False. Being “heterozygous at one or both loci” excludes only the 
progeny that are homozygous at both loci. In part (b) of this question, it 
was calculated that 1/4 will be homozygous at both loci; therefore, 3/4 
will be heterozygous at one or both loci (2/16 AaBB, 2/16 AaBB, 4/16 
AaBb, 2/16 Aabb, and 2/16 aaBb). 


10a. Mottled is the dominant phenotype. 
10b. The results are consistent with autosomal inheritance. 


10c. 1/2 of the F, of both crosses are expected to be homozygous, and 
1/2 are expected to be heterozygous. 


10d. One cross would be a test cross, and the other would be a back- 
cross. The test cross would be the mottled F, to a true-breeding leopard 
individual. The backcross would be the mottled F; to one of its parents 
(both are heterozygotes). 


12a. The mode of fur color inheritance in these crosses is likely to be 
a single gene with two alleles controlling fur color, and black will be 
dominant to brown. 


12b. Assign the letter B for the black allele and b for the brown allele. 
The brown male must be bb. The black female in the first cross must be 
Bb. The black female in the second cross must be BB. 


14a. The results suggest that the inheritances of color and fin shape are 
each due to segregation of two alleles at a single gene. 


14b. The results indicate that gold is dominant to black and that split 
fin is dominant to single fin. 


14c. The chi-square value for color is 0.061, which corresponds to a P 
value between 0.7 and 0.9. The chi-square value for fin shape is 0.05, 
which corresponds to a P value between 0.7 and 0.9. Neither P value is 
less than 0.05, which indicates that the hypothesis of one gene with two 
alleles for color and fin shape cannot be rejected. 


16b. Of the F, progeny, 3/4 will have yellow seeds, 1/4 will have green 
seeds, 3/4 will have round seeds, and 1/4 will have wrinkled seeds. 


16c. 9/16 yellow, round; 3/16 yellow, wrinkled; 3/16 green, round; 
1/16 green, wrinkled 


20. The x? value is 4.81. There is one degree of freedom (df= 1) in this 
calculation, and the P value is less than 0.05. Therefore, the hypothesis 

that bicolor corn-kernel color is the result of segregation of two alleles 

at a single genetic locus should be rejected for this experiment. 


22a. 0.132 
22b. 0.178 
22c. 0.822 
24a. 0.422 
24b. 0.0313 
24c. 0.4219 
24d. 0.0469 


26. 1800 full wings and gray bodies, 600 full wings and ebony bodies, 
600 vestigial wings and gray bodies, 200 vestigial wings and ebony bodies 


28a. Evaluate: The Blue Persian and Spanish Dwarf varieties are as- 
sumed to be true-breeding for seed color and plant height; therefore, 
the resulting F; generation will be heterozygotes. Deduce: Since tall 
and white are expressed in the heterozygotes, they are the dominant 
traits. The reappearance of short and blue in the F; confirm that they 
are the recessive traits. Solve: Tall and white are the dominant traits, 
whereas short and blue are recessive. 


28b. The expected phenotypic distribution in the F, is 9/16 tall, white; 
3/16 tall, blue; 3/16 short, white; and 1/16 short, blue. 


28c. The hypothesis being tested in this experiment is that the two pea 
plant varieties differ at two independently assorting genetic loci, and 
two alleles show simple dominance at each locus. 


28d. The chi-square value is 0.78, which corresponds to a P value 
between 0.9 and 0.7, indicating that the hypothesis cannot be rejected 
(the results are consistent with the hypothesis). 


30a. 9/16 
30b. 3/8 
30c. 3/4 


32a. Evaluate: The total number of children in these families was 480. 
The number of children with CF was (52 X 1) + (32 X 2) + (18 X 3) 4 

(2 X 4) = 178; therefore, 480 — 178 = 302 were normal. Deduce: The 
expected number of children with CF is 480 x 1 = 120; therefore, 
480 — 120 = 360 are expected to be normal. Solve: The chi-square 
value using these numbers is 37.4. The number of degrees of freedom 
is (2 classes) — 1 = 1. Using Table 3.4, the chi-square of 37.4 for 

1 degree of freedom corresponds to a P value less than 0.001, which 

is much lower than 0.05, indicating that observed results are incon- 
sistent with those expected and that you must reject the hypothesis 
that CF is inherited as an autosomal recessive trait based on these 
results. Solve: The observed results for the total number of children 
with CF in families in which both parents are carriers are inconsistent 
with expectations (P value < 0.001). 


32b. One expects 38 families to have no CF children, 50.6 to have 
1 child with CF, 25.3 to have 2 children with CF, 5.6 to have 3 children 
with CF, and 0.47 to have 4 children with CF. 


32c. Evaluate: The observed values are given in the problem, and the 
expected values were calculated in the answer to part b. Deduce: The 
chi-square using these numbers is 47. The number of degrees of freedom 
is (5 classes) — 1 = 4. Using Table 3.4, the chi-square value of 47 for 


4 degrees of freedom corresponds to a P value less than 0.001. This in- 
dicates that the difference between the observed and expected results is 
highly statistically significant and therefore is not consistent with that ex- 
pected under binomial probability. Solve: The results are not consistent 
with expectations based on a binomial distribution (P value < 0.001). 


34. 0.988 


36. The probability of 5 unaffected children is 0.237, of 4 unaffected 
and 1 affected is 0.396, of 3 unaffected and 2 affected is 0.264, of 2 unaf- 
fected and 3 affected is 0.0879, of 1 affected and 4 unaffected is 0.0146, 
and of all 5 affected is 0.000977. 


38. Experiment One: (1) Cross the true-breeding short, brown-furred 
and long, white-furred guinea pigs to create an F, population; (2) test 
that they are dihybrids by crossing male F; with female F; to produce 
F, guinea pigs; (3) cross both sets of pure-breeding parents to create 
24 F; progeny and intercross all the F; (12 crosses) to produce 144 Fy; 
(4) compare observed phenotypic distribution with expected results. 
Experiment Two: (1) Cross the F with their long, white-furred par- 
ent to produce backcrossed guinea pigs; (2) cross two of the F; males 
with their long, white-haired mother, thus producing 24 progeny; and 
(3) cross the long, white-haired male guinea pig with all of the short, 
brown-haired female F4, which would produce 144 progeny. 


40. Cross the parents to produce (EfRrTt) trihybrids, and self-fertilize 
the F; to create an F, population that will include all possible pheno- 
types and genotypes. Among the F>, 3/64 will produce yellow, pear- 
shaped tomatoes and have axial flowers ( ffrr). To determine which of 
these plants are TT, self-fertilize them and identify the plants that breed 
true for axial flower position. 


42a. All four adults are heterozygous carriers of alkaptonuria (Aq). 


42b. For Sarah and James, the chance that their next child will have 
alkaptonuria is 1/4, not very low; for Mary and Frank, the chance 
is 1/6000, not 0. 


42c. 1/4 
42d. 3/4 


42e. The probability that one of their children with alkaptonuria will 
have a child with alkaptonuria is dependent on the genotype of their 
child’s mate. Deduce: If their child’s mate has no family history of 
alkaptonuria, then there is a 4/1000 chance that they will be a carrier. 
If the mate is a carrier, then there will be a 1/2 chance that a grandchild 
will have alkaptonuria. The overall probability is therefore (4/1000) x 
1/2 = 1/500. The probability that their affected child will have a child 
with alkaptonuria will increase dramatically if their child’s mate has 

a close relative with the disorder. For example, if their child’s mate’s 
grandmother had alkaptonuria, then the mate will have at least a 1/2 
chance of being a carrier, and the overall probability that their first child 
will be affected is at least 1/2 X 1/2 = 1/4. 


44. The genotypic ratios will be 4/9 FFpp, 4/9 Efpp, and 1/9 ffpp. The 
phenotypic ratio will be 8/9 feathered legs and single comb, and 1/9 no 
leg feathers and single comb. 


Chapter 3 

2a. 48 chromosomes 

2b. 48 chromosomes 

2c. 24 chromosomes (23 autosomes and 1 sex chromosome) 
2d. 48 chromosomes 

2e. 48 chromosomes 

2f. 48 chromosomes 


4. Cohesion opposes the pulling forces attempting to separate sister 
chromatids until all pairs of sister chromatids are attached to micro- 
tubules from opposite poles of the spindle (i.e., bipolar attachment). 
Premature, as well as delayed, sister chromatid separation can cause 
sister chromatids to partition together instead of separating during 
anaphase and can lead to errors in chromosome segregation. Sister 
chromatid cohesion is due to cohesin, a protein complex that binds 
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to sister chromatids and attaches them to each other. When all pairs 
of sister chromatids are under the tension generated by bipolar at- 
tachment and sister chromatid cohesion, the protease separase is 
activated. Separase cleaves a component of cohesin, which simultane- 
ously ends cohesion on all pairs of sister chromatids and allows sister 
chromatids to be pulled toward opposite poles of the spindle. 


8. Anaphase II 


10. A normal human female nucleus contains one Barr body, whereas a 
normal male nucleus contains no Barr bodies. 


12a. Dd 
12b. 50% 


12c. There is no chance that her daughter will have OTD, but there is a 
1/2 or 50% chance that her daughter will be a carrier for OTD. 


12d. dY 

12e. 1/2 of the daughters and 1/2 of the sons will have OTD. 

14a. The male parent was M*Y and P'p and the female was Mm 
and P*p. 

14b. Among the females with purple eyes, half are M*m and p p and 


half are M*M* and p p`. Males with purple eyes and miniature wings 
arem Yandpp. 
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16. The unusual number of Barr bodies seen in the rare male and 
female infants is the result of nondisjunction during meiosis, which 
creates abnormal gametes containing an additional X chromosome or 
lacking an X chromosome. 


18. Evaluate: The syndrome is an X-linked, recessive trait. 
Therefore, males inheriting this mutation must express it (hemizy- 
gous), whereas it is possible for females to be heterozygous for the 
mutation (carrier females). Deduce: Because of random X inac- 
tivation, the tissue of heterozygous females will contain some cells 
that express the wild-type allele and some that express the mutant 
allele. The problem states that most female carriers do not show 
symptoms; therefore, the pattern of X-inactivation normally results 
in sufficient wild-type gene expression to promote normal develop- 
ment. Solve: In female carriers that show symptoms, X-inactivation 
must have occurred such that wild-type gene expression was insuffi- 
cient for normal development. The symptoms are less severe in these 
symptomatic female carriers because they express some level of the 
wild-type allele. 


22. For cross A, all males are barred-feathered, 1/2 of the females 

are barred-feathered and 1/2 are non-barred. For cross B, 1/2 males are 
barred, 1/2 males are nonbarred, 1/2 females are barred, and 1/2 are 
nonbarred. 


24a. The reciprocal crosses produced different results, which is diag- 
nostic for sex-linked traits. 


24b. The female is the heterogametic sex (ZW), whereas males are ho- 
mogametic (ZZ). Black spot is dominant, and nonspotted is recessive. 
Using Z-linked alleles designated B for black-spot and b for nonspotted, 
the parents of cross I are Bb male and bW female. Their progeny are 
1/4 black-spot males (Bb), 1/4 nonspotted males (bb), 1/4 black-spot 
females (BW), and 1/4 nonspotted females (bW). For cross II, the par- 
ents are a bb male and a BW female. Their progeny are approximately 
1/2 Bb males and 1/2 bW females. 

26. Rare sex-reversed males carry an altered X chromosome that con- 
tains a fragment of the Y chromosome including the SRY gene. Thus, 
they develop male sex characteristics yet lack a Y chromosome. Rare 
sex-reversed females carry an altered Y chromosome the lacks the SRY 
gene. Thus, they develop female sex characteristics even though they 
have a Y chromosome. 

28a. Female is ECec; Vv and the male is ecY; Vv. 

28b. The parents are ECec; Vv Ee female and ECY; Vv Ee male. 

28c. The parents are ECec; Vv ee female and ecY; Vv Ee male. 


30. Evaluate: The children of the first mating, Cc X cY, are expected 
to be 1/2 color blind and 1/2 normal, with equal proportions of male 
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and female progeny in each class. The children of the second mating, 

cc X CY, are expected to be 1/2 color blind and 1/2 normal, but all the 
color-blind individuals will be male and all the normal individuals will be 
female. Deduce: The reciprocal matings give different results, which is 
indicative of X-linked inheritance. Solve: The results suggest recessive 
inheritance because the mother in the first mating is not affected but has 
affected children. The results differ from those predicted for autosomal 
recessive inheritance because the second mating yields all males with 
one phenotype and all females with the other. 


Chapter 4 


2. Evaluate: Epistasis indicates that two or more genes interact to 
contribute to a particular phenotype. Pleiotropy indicates that one 
mutant allele contributes to two or more mutant phenotypes. Deduce 
and Solve: Epistasis and pleiotropy can be distinguished by inheritance 
patterns in pedigrees or from crosses. If the inheritance pattern of one 
phenotype indicates that more than one gene is segregating, then epis- 
tasis is occurring. If two or more phenotypes are inherited together in a 
pattern that indicates segregation of alleles of a single gene, then pleiot- 
ropy is occurring. 


4a. The bacteria in the 12 colonies that grew on minimal medium are 
prototrophs, whereas those in the 3 colonies that could not grow on 
minimal medium were auxotrophs. 


4b. The bacteria in the three colonies that could not grow on minimal 
medium but could grow on minimal medium plus serine are serine- 
requiring auxotrophs. They carry mutations in one or more genes 
required for the biosynthesis of serine. 


4c. Mutant 1 carries a mutation in the gene coding for a component of 
enzyme C, mutant 2 carries a mutation in the gene coding for a compo- 
nent of enzyme A, and mutant 3 carries a mutation in the gene coding 
for enzyme B. 


6. Child b’s parents must be 1. Child a’s parents could be 1 or 3; how- 
ever, since child b’s parents are 1, child a’s parents must be 3. Child c’s 
parents could be 3 or 4; however, since child a’s parents are 3, child c’s 
parents must be 4. Child d’s parents could be 2, 3, or 4; however, since 
child a’s parents are 3 and child c’s parents are 4, child d’s parents must 
be 2. 


10a. At the B locus, one parent was Bb and the other was bb. At the D 
locus, one parent was Dd and the other was dd. 

10b. One parent was BbDd (brown) and the other was bbDad (yellow). 
10c. At the B locus, one parent was BB and the other parent could have 
had any genotype (BB, Bb, or bb). At the D locus, one parent was Dd 
and the other was dd. 


12a. One parent was BBDdCc and the other was DdCc. Any genotype is 
possible at the B locus. 

12b. The parents were BbddCc X bbddCc. 

12c. Both parents were BbDdCc. 

12d. The parents were both Bb at the B locus. Between the two par- 
ents, there was a DD and a CC locus, but these need not have been in 
the same parent. The other D and C loci could have any genotype. 

14. The F, will be 1/4 red, 1/2 pink, and 1/4 ivory. 

16. 2/3 short stature and short limbs and 1/3 normal stature and limbs 
18a. Pure-breeding red petunias are aaBB. 

18b. Pure-breeding blue petunias are AADD. 

18c. The phenotypic distribution in the F, will be 9/16 purple, 3/16 
red, 3/16 blue, and 1/16 white. 

20a. Evaluate: Recall that variable expressivity indicates that in- 
dividuals with the same mutant genotype vary in severity of mutant 
phenotype. Deduce: Evidence of variable expressivity would be the ap- 
pearance of some individuals with two thumbs affected and others with 


only one thumb affected. The pedigree shows both types of affected 
individuals. Solve: Yes 


20b. Evaluate: Recall that incomplete penetrance indicates that 

not all individuals with a mutant genotype show the mutant pheno- 
type. Deduce: Evidence of reduced penetrance would be the appear- 
ance of an affected child that had unaffected parents. IV-4 and IV-5 are 
affected children of unaffected parents. The nonpenetrant individuals 
are III-5 and III-10. Solve: Yes 


22a. Genetic heterogeneity 


22b. There are five complementation groups: Group 1 is defined by 
mutations 1, 3, and 7; Group 2 is defined by mutations 4 and 8; Group 3 
is defined by mutations 5, 6 and 10 and Group 4 is defined by mutation 
2. Mutation 9 fails to complement any of the other mutants and may 
represent complementation group 5. 


24a. The parental strains are AAbb and aaBB. 


24b. The phenotypic distribution will be 9/16 blue (A_B_), 6/16 purple 
(A_bb + aaB_), and 1/16 red (aabb). 


24c. The progeny of the backcross of the F; to the AAbb parent will be 
1/2 blue (AABb and AaBb) and 1/2 purple (AAbb and Aabb) progeny. 
The progeny of the backcross of the F; to the aaBB parent will be 

1/2 blue (AaBB and AaBb) and 1/2 purple (aaBB and aaBb). 


26a. Yı X G, produces all yellow F4, which produce 3/4 yellow and 

1/4 green F}. Green appears to be recessive and the 3:1 ratio suggests 
segregation of alleles of a single gene. Y3 X G; produces all green F4, 
which produces 3/4 green and 1/4 yellow. Here, yellow appears reces- 
sive and, again, the 3:1 ratio suggest segregation of alleles at a single 
gene. Yı X Y2 produces all yellow Fj, which produces 13/16 yellow and 
3/16 green. The sixteenths in the F, suggests alleles at 2 genes are seg- 
regating. The two genes segregating in the third cross correspond to the 
genes segregating in the first two crosses. Solve: The results of these 
crosses indicate that two genes control squash fruit color. 


26b. In cross I, Y4 is AABB, and G4 is aaBB. Their yellow F, progeny 
are AaBB, and their F, are 3/4 yellow (A_BB) and 1/4 green (aaBB). In 
cross II, Yo is aabb, G; is aaBB, their green F are aaBb, and their Fz 

are 3/4 green (aaB_) and 1/4 yellow (aabb). The F; of Y; X Y, resulting 
from cross III are AaBb and their F, are 9/16 yellow (A_B_), 3/16 yellow 
(A_bb), 3/16 green (aaB_), and 1/16 yellow (aabb). 


26c. The progeny will be 1/2 yellow (AaBB and AaBb) and 1/2 green 
(aaBB and aaBb). 


28a. All the mutants are temperature-sensitive mutants, carrying a 
mutant allele of a gene required for normal growth at 37°C. The typical 
mechanistic explanation of this is that the gene is required for growth 
at all temperatures, and the mutant allele is functional at 25°C but not 
37°C (an alternative explanation is that the gene is required only at 
37°C). This explains the five mutants that cannot grow at 37°C. For the 
two mutants that grow slowly at 37°C, the alleles probably retain partial 
function at 37°C. 


28b. This study identifies three complementation groups: Group 1 is 
defined by A, D and F, Group 2 is defined by B and G, and Group 3 is 
defined by C and E. 


30a. The chi-square value is 


(22 — 25)? (23 — 25)? (55 — 50)? 
1.02. 
25 25 50 


There are three phenotypic classes; therefore, this calculation has two 
degrees of freedom. A chi-square value of 1.02 with 2 df gives a P value 
between 0.5 and 0.7; therefore, the results are consistent with a 1:2:1 
hypothesis, and that hypothesis cannot be rejected. 
30b. The chi-square value is 

(22 — 18.75)? (55-56)? (23 — 25)? 


H + 0.75. 
18.75 55 25 


There are three phenotypic classes; this calculation has two degrees of 
freedom. A chi-square value of 0.75 with 2 df gives a P value between 
0.5 and 0.7; therefore, the results are consistent with a 9:4:3 hypothesis, 
and that hypothesis cannot be rejected. 


30c. Neither hypothesis can be rejected based on the chi-square 
analysis. 


30d. Self-fertilize all the purple progeny and determine the proportion 
that breed true. If 1:2:1 is correct, all purple plants will breed true. If 
9:4:3 is correct, then 2/3 of the purple plants will not breed true 

(i.e. they will produce some whites). 


32. Strains 1 and 2 are homozygous for mutations in the same gene, A, 
that causes albinism. Strain 3 is homozygous for a mutation in a differ- 
ent gene, B, which causes albinism. Strains 1 and 2 are aaBB, as are the 
F; and F; of cross A. Strain 3 is AAbb. The F; of cross B and cross C are 
AaBb. The F, of cross B and cross C are 9/16 A_B_ (pigmented), 3/16 
A_bb (albino), 3/16 aaB_ (albino), and 1/16 aabb (albino). 


Chapter 5 
2a. Parental: 41% DR, 41% dr; recombinant: 9% Dr, 9% dR 
2b. Parental: 41% Dr, 41% dR; recombinant: 9% DR, 9% dr 


4. Eand Hare not genetically linked, because a single crossover in 
every meiosis results in production of equal proportions of EH, Eh, eH, 
and eh gametes. EH and eh are parental gametes, whereas Eh and eH are 
recombinant gametes. The percentage of parental gametes is the same 
as the percentage of recombinant gametes, which is 50%. 


6a. Yes, the y and w genes are expected to show linkage because they 
are less than 50 map units (m.u.) apart. 


6b. Yes, y is expected to assort independently of f because y and fare 
more than 50 m.u. apart. The same applies to w and f. 


6c. There will be 24.625% of each of the following progeny types: gray 
with red eyes and forked bristles, gray with red eyes and normal bristles, 
yellow with white eyes and forked bristles, and yellow with white eyes 
and normal bristles. There will be 0.375% of each of the following types: 
gray with white eyes and forked bristles, gray with white eyes and nor- 
mal bristles, yellow with red eyes and forked bristles, and yellow with 
red eyes and normal bristles. 


6d. The female is heterozygous at y, w, and f, and the male is hemizy- 
gous recessive. Consider the linked y and w loci first. The y and w genes 
are 1.5 m.u. apart, so the females will make 0.4925 of each parental 
gamete type, which are y'w* and y w . They will make 0.0075 of each 
recombinant gamete type, which are y'w_ and y w’. The fgene is un- 
linked to y and w, so its alleles assort independently of y and w; 0.50 of 
each y w genotype receives an f * allele and 0.50 receives an fallele. The 
fraction of y‘w'f* is 0.4925 X 0.5 = 0.24625. The same is true for y'w*f, 
yw ft, andy wf. The fraction of y'w f* is 0.0075 X 0.50 = 0.00375. 
The same is true for y'wf, y w'f*, andy w'f-. 


8a. See figure. 


T G R 
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8b. A trihybrid organism with dominant alleles on one chromosome 
and recessive alleles on the homologous chromosome (GRT/grt) could 
be test crossed to a pure-breeding recessive (grt/grt). The number of 
test-cross progeny in each outcome category can be used to determine 
which genetic map is correct. 


10. Syntenic genes that are separated by 50 map units or more will 
assort independently because there will be one or more crossovers 
between them per meiosis. 

12a. To measure distance between Y and Lz, cross a yl/y'I* female to a 
yl/Y male. The progeny should be 36% (360/1000) yellow lozenge, 36% 
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(360/1000) gray normal eyes, 14% (140/1000) yellow normal eyes, 

and 14% gray lozenge eyes. To measure the distance between Lz and F, 
cross an /f/l*f* female to an /f/Y male. The progeny will be 34% 
(340/1000) lozenge forked bristles, 34% (340/1000) normal eyes and 
normal bristles, 16% (160/1000) normal eyes with forked bristles, and 
16% (160/1000) with lozenge eyes and normal bristles. 


12b. No cross can demonstrate genetic linkage between genes Yand F 

because the recombination frequency between genes Y and Lz plus that 
between Lz and F is greater than 50%; therefore, the percent recombina- 
tion between Y and F in any cross will be 50%. 


12c. Syntenic genes (genes on the same chromosome) that are sepa- 
rated by more than 50 map units do not display genetic linkage and, 
therefore, assort independently. 


14a. Nail—patella syndrome is a dominant trait. 

14b. Yes, NPS appears to segregate with blood type A in this pedigree, 
indicating genetic linkage between these traits. 

14c. 1-1 is 1°n/1n, 1-2 is AN/1°n, Il-2 is 1°n/Tn, 1-4 is AN/1°n, Il-6 
is I4N/I°n, IL-7 is 1°n/In, and U-9 is IAN/T°n. 

14d. IL-6 is 1°N/1°n and III-8 is °n/T On. Even though III-6 and III-8 
are both O blood type, III-6 has nail—patella syndrome because he in- 
herited the recombinant chromosome, JON, from his mother, whereas 
III-8 inherited the nonrecombinant J°n from her mother. 
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14e. The genotypes of II-11 and II-12 cannot be unambiguously de- 
termined because they have type A blood and both of their parents are 
I^I®. Thus, either or both could be [4/4 or [47°. For this reason, it is 
not clear whether III-11 and I-12 are parental-type or recombinant- 
type progeny. 

16a. The order is G T L. 

16b. The recombination frequencies are 0.139 between G and T, 0.088 
between T and L, and 0.214 between G and L. 


16c. The recombination frequency for G and L is less than for G and T 
plus T and L because the double-crossover progeny do not appear to be 
recombinant for G and L and, therefore, are not counted. 


16d. The interference value (J) is 0.49. 


16e. The meaning of J = 0.49 is that only half of the expected number 
of double-crossover progeny were observed. This indicates that a mei- 
otic cell undergoing a crossover between G and T is about half as likely 
to also have a crossover occur between T and L. Similarly, a crossover 
between T and L reduces the likelihood of a crossover between G 

and T. 


18a. Yes, the data provides strong support for linkage between Rh and 
elliptocytosis because the maximum lod score supporting linkage is 
above 3 (it’s about 5.5 for linkage at a 8 value just over 0.1). 


18b. The maximum lod score is about 5.5 for linkage at a O value just 
over 0.1. 

18c. The results support linkage at 0 values from just under 0.05 to 
about 0.30. 

22. The chi-square value is 168.3. For 3 degrees of freedom, this 
corresponds to a P value of less than 0.001. 

24a. The pure-breeding brown-eyed fly line is ccd*d*, the pure-breeding 
short-bristled line is c'c'dd, and the Fy is cc'dd*. 

24b. The cross should be a test cross of the F; dihybrid. The test-cross 
strain would be ccdd. The test cross will yield four progeny categories 
whose phenotypes will be determined by the dominant or recessive 
alleles contributed by the F, dihybrid. 


24c. The progeny will be 36% cd + 36% c'd, 14% cd, and 14% c*d*. 
24d. The progeny will be 25% cd*, 25% cd, 25% cd, and 25% cd”. 


28a. I-1 is either N1/n2 or N2/n1. 1-2 is n2/n2. Il-1 is N1/n2. Il-2 is n2/ 
n2. IlI-1 is N1/n2. IIl-2 is N1/n2. II-3 is n2/n2. IlIl-4 is N1/n2. III-5 is 
n2/n2. Il-6 is N2/n2. II-7 is N1/n2. III-8 is n2/n2. 

28b. III-6 is a recombinant. Her genotype indicates that the marker 
allele 2 is on the same chromosome as the NF allele, unlike the allele 
arrangement in her mother (II-1). 


A-6 APPENDIX: ANSWERS 


28c. 1/8 


30a. The gene order is scute, echinus, crossveinless. The allelic phase 
in the trihybrid is +e+/s+c. 


30b. The recombination frequency between scute and echinus is 0.067. 
The recombination frequency between echinus and crossveinless is 
0.095. The recombination frequency between scute and crossveinless 

is 0.162. 


30c. There are no discrepancies across this genetic interval. 


30d. The chi-square value is 38,555, which corresponds to a P value 
well below 0.01, which indicates that the results of this experiment are 
not due to independent assortment. 


32a. The chi-square values for both sets of data correspond to P values 
well below 0.01 and therefore indicate that the results significantly devi- 
ate from expectation based on independent assortment. This supports 
linkage of the colorless and waxy genes. 


32b. The recombination frequency from cross 1 was 0.27, and the 
recombination frequency from cross 2 was 0.24. 


32c. Yes, both sets of data are compatible with the hypothesis of 
genetic linkage, although the recombination frequencies of the two 
sets differed slightly. 


32d. The recombination frequency using combined data is 0.27 


Chapter 6 


2. Link 1: Transfer of an entire F” plasmid from an F” cell converts an 
F- cell to an F* cell. Link 2: Integration of the F plasmid into the host 
chromosome converts an F” cell to an Hfr cell. Link 3: Precise excision 
of the F plasmid from the chromosome of an Hfr cell converts an Hfr 
cell to an F* cell. Link 4: Excision of the F plasmid plus some host DNA 
from the chromosome of an Hfr cell converts an Hfr cell into an F’ cell. 


4. Evaluate: All three mechanisms can involve homologous recom- 
bination of the transferred DNA into the recipient chromosome. In all 
three mechanisms, if the DNA entering the cell is linear or does not 
contain an origin of replication, then recombination of the DNA into 
the recipient cell chromosome or episome is required for the DNA to 
be stably maintained. If the DNA entering the recipient is circular and 
contains sequences required for replication and maintenance, then 
recombination into the recipient cell chromosome or an episome is not 
required. 


These three mechanisms differ in how DNA is transferred from one cell 
to another. Only conjugation requires genetic information for transfer 
in the donor cell (F plasmid DNA) and physical contact between the do- 
nor and recipient cell. Transduction is characterized by infection of the 
donor cell by a bacteriophage. On the other hand, transformation does 
not require particular genes in the donor or the help of a phage: DNA is 
released from the “donor” cell due to cell lysis and enters the “recipient” 
cell via DNA transporters. 


6. Lysis of an infected bacterial host cell, and the release of progeny 
phage particles, is the end result of the lytic cycle of bacteriophage. 
Lysogeny involves the integration of the phage chromosome (known 
as a prophage once integrated) by site-specific recombination into a 
specific site (DNA sequence) in the bacterial chromosome. Once inte- 
grated, the prophage can replicate along with the rest of the bacterial 
chromosome until conditions induce excision of the prophage and 
resumption of the lytic cycle. 


8. A prophage is a bacteriophage genome that is part of the host cell 
chromosome. It is formed by integration of a bacteriophage chromo- 
some into the host cell chromosome by site-specific recombination. 


10. In genetic complementation, bacterial lysis occurs because the two 
viruses have mutations in different genes. This is analogous to comple- 
mentation analysis in eukaryotes. In recombination, bacterial lysis 
occurs because, although the viruses have mutations in the same gene, 
rare homologous recombination events produce recombinant wild-type 
viruses. Complementation and recombination can be differentiated by 


the frequency of bacterial lysis after simultaneous infection with two 
mutant viruses: lysis is frequent in the case of complementation (many 
plaques are formed), whereas it is rare in the case of recombination. 


20a. Selection for met* was done on minimal medium containing 
glucose and phenylalanine. The met* transductants were assayed for 
cotransduction of phe* using minimal medium containing glucose. 

The met* transductants were assayed for cotransduction of ara‘ using 
minimal medium containing arabinose. Selection of phe* transductants 
was done on minimal medium containing glucose and methionine. The 
phe* transductants were assayed for cotransduction of met* on minimal 
medium containing glucose. The phe* transductants were assayed for 
cotransduction of ara‘ on minimal medium containing arabinose. The 
met’ phe* transductants were selected on minimal medium containing 
glucose. The met” phe” transductants were assayed for cotransduction 
of ara‘ on minimal medium containing arabinose. The ara‘ transduc- 
tants were selected on minimal medium containing arabinose, phe- 
nylalanine, and methionine. The ara‘ transductants were assayed for 
cotransduction of met* on minimal medium containing arabinose and 
phenylalanine. The ara* transductants were assayed for cotransduction 
of phe” on minimal medium containing arabinose and methionine. 


20b. The gene order is phe-ara-met or met-ara-phe. 
22a. 4 genes 


22b. Mutations 1, 5, and 8 are in one gene. Mutation 2 is in a second 
gene. Mutations 3 and 7 are in a third gene. Mutations 4 and 6 are ina 
fourth gene. 


22c. Complementation resulted in the formation of many plaques on 
each plate (the lysis of many different bacteria) due to coinfection by 
bacteriophage with mutations in different genes. The vast majority 

of these phage are mutants that cannot by themselves infect and lyse 
bacteria. Recombination between mutations in the same gene results 
in rare plaques (very few bacteria lyse); however, all the virus particles 
produced are wild type and can infect and lyse bacteria. 


22d. Mutation 9 is a deletion that inactivates two genes. It overlaps 
mutations 1 and 7 but not mutation 3, 5, or 8. 


22e. Mutation 10 is a deletion mutation that inactivates two genes. It 
overlaps mutations 4 and 8 but not mutation 1, 5, 6, or 9. 


22f. The mutation order is 3, 7, 1, 5, 8, 4, 6, and 2. 


Chapter 7 


2. Evaluate: The key results were those showing that enzymes that 
destroyed RNA and protein did not destroy the transforming principle, 
whereas enzymes that destroyed DNA did. Solve: The transforming 
principle was considered to be genetic material. The most reasonable 
interpretation of those results was that DNA was the only essential com- 
ponent of the transforming principle, and therefore, the genetic material. 


4. Evaluate: Hershey and Chase prepared T2 particles whose protein 
was labeled with the radioactive sulfur, S*°, and whose DNA was labeled 
with radioactive phosphorous, P??. They used the labeled T2 to infect 
bacteria and then separated the infected bacteria from the empty phage 
shells (phage ghosts) using a blender. They found that essentially all the 
P??-labeled T2 DNA but little to none of the S*°-labeled T2 protein was 
in the infected bacterial cells. Solve: Since T2 genetic material must be 
inside the infected cells in order to direct new virus particle synthesis, 
these results pointed to DNA as the genetic material of phage T2. 


6. Evaluate: The problem concerns the chemical bonds that form 
base pairs in double-stranded DNA. Recall that hydrogen bonds are 
weak, non-covalent bonds involving two atoms sharing a hydrogen 
nucleus. The distance between the atoms sharing the hydrogen nucleus 
is critical for hydrogen bonds to form. Deduce: The bases in the two 
complementary antiparallel DNA strands are aligned such that each 

of the atoms that share hydrogen nuclei (N and O or N and N) in each 
base are positioned next to each other at a distance that allows all pos- 
sible hydrogen bonds to form. Solve: The bases in the complementary 
but parallel strands are not aligned in this manner; therefore, the atoms 


that could form hydrogen bonds do not align and are not close enough 
together to allow hydrogen bonding between all possible and necessary 
chemical groups. 


8a. Phosphodiester bonds 

8b. Hydrogen bonds 

8c. There are 12 phosphodiester bonds in the molecule. 
8d. There are 17 hydrogen bonds in the DNA molecule. 


10. DNA polymerase III determines which free nucleotide triphos- 
phate is complementary to the base being copied. DNA polymerase III 
catalyzes phosphodiester bond formation between the a-phosphate of 
the incoming nucleotide triphosphate and the 3’ hydroxyl group of the 
last nucleotide added to the strand. 


12. RNA is synthesized and serves as a primer for elongation by DNA 
polymerase. 


14a. DNA polymerase I is required to remove the RNA primer and fill 
in the gap with DNA. DNA polymerase III is responsible for the bulk of 
synthesis of DNA on the leading and lagging strands. 


14b. The absence of DNA pol I will not prevent the bulk of DNA rep- 
lication but will result in newly replicated DNA containing small seg- 
ments of RNA and nicks at the junctions of polymerase III synthesized 
DNA and the 5’ end of the RNA primers. 


14c. An E. coli mutant without a functional DNA polymerase III will be 
unable to replicate its DNA because it lacks the enzyme responsible for 
the bulk of DNA synthesis during replication. 


16a. True 
16b. False 
16c. True 
16d. False 
16e. True 
18. Helicase, SSB, primase, DNA pol III, DNA pol I, ligase 


20. Evaluate: Recall the Meselson-Stahl experiment and consider 
how the results excluded the alternatives to the semiconservative 
model for DNA replication. Meselson and Stahl initially cultured 

E. coli in medium containing only N15 (heavy nitrogen) until all cells 
contained only N15/N15 DNA. They then cultured the N15/N15 

E. coli in normal medium (N14) and collected samples after one, two, 
and three rounds of DNA replication. The results showed that before 
transfer to N14 medium, only N15/N15 DNA was present. After one 
round of replication in N14 medium, all of the DNA was N15/N14; 
after two rounds of replication, half the DNA was N15/N14 and half 
was N14/N14; and after the third round of replication, 1/4 of the DNA 
was N15/N14 and 3/4 was N14/N14. Deduce: The conservative 
model predicted that the original N15/N15 DNA would remain 
throughout, and therefore the results ruled out the conservative model. 
The dispersive model predicted that after each round of replication 
there would be only one form of DNA, which would become less and 
less dense. Solve: Although this model was not ruled out after one 
round of replication, the persistence of the N15/N14 DNA and the 
presence of two classes of DNA (N15/N14 and N14/N14) after rounds 
two and three ruled out the dispersive model. 


22. Evaluate: Cells were incubated in medium containing 3H-thymine 
for a short period of time (a “pulse”) and then transferred to medium 
containing an excess of unlabeled thymine (the “chase”). The cells were 
then collected and their DNA was prepared for electron microscopy, 
which can detect replication bubbles in DNA, and for autoradiog- 
raphy, which reveals the location of 3H-thymine incorporation into 
DNA. Deduce: Recall that bidirectional replication from a replication 
origin produces a replication bubble with DNA synthesis occurring at 
both ends, whereas unidirectional replication results in a replication 
bubble with DNA synthesis occurring at one end. The results showed 
that DNA replication bubbles contained regions of label on the sides 

of the midpoint of the bubble, which corresponds to the replication 
origin. Solve: If DNA synthesis was unidirectional, then label would be 
present on only one side of the origin. 


APPENDIX: ANSWERS A-7 


24. Evaluate: DNA helicases unwind dsDNA during DNA replica- 
tion and repair. Bloom syndrome is characterized by chromosome 
instability and an increased rate of cancer. Chromosome instability is 
evident when chromosomes are lost from cells, typically because of a 
failure during mitosis. Mitosis fails to occur properly if chromosomes 
are not completely replicated. Cancer is a disease caused by accumula- 
tion of somatic mutations, which will accumulate at an elevated rate if 
DNA repair by DNA replication is defective. Solve: Based on the in- 
formation provided, it is reasonable to speculate that lack of the DNA 
helicase encoded by the Bloom syndrome gene results in incomplete 
replication during S phase and during repair of DNA damage. Failure 
to completely replicate chromosomes could result in a failure to pass 
chromosomes on to progeny cells during mitosis, which would results 
in chromosome instability. Failure to repair DNA damage would 

also lead to an increased rate of somatic mutation, which would lead 
to cancer. 


26a. Telomeric DNA is composed of a repetitive, short DNA sequence. 
In many organisms, the repeated sequence is 5'- TTAGGG- 3’ ora 
variant thereof. 


26b. Telomerase uses a segment of its RNA as the template to add mul- 
tiple copies of a simple sequence to the 3’ end of each strand of DNA on 
a linear chromosome. This strand, which corresponds to the template 
for lagging strand synthesis, is copied by the normal mechanism of lag- 
ging strand synthesis after it is extended by telomerase. 


26c. Telomeres are thought to provide two functions, one in chromo- 
some replication and the other in chromosome protection. Telomeres 
provide a mechanism for replication of the ends of linear chromo- 
somes. Without telomeres, lagging strand synthesis would fail to extend 
to the chromosome ends, leaving a gap at each end after each round 

of replication. This would shorten the chromosome and, after many 
rounds of replication, would result in loss of important DNA sequences 
(genes). Telomeres are repetitive DNA, which prevents loss of im- 
portant DNA sequences if shortening occurs. Telomeres are also the 
binding site for telomerase, which extends the lagging strand template 
to compensate for sequences lost during incomplete lagging strand 
synthesis. Telomeres also provide a protective “cap” on the ends of 
linear chromosomes; this cap distinguishes normal chromosome ends 
from ends generated by double-stranded chromosome breaks (DNA 
damage). Without telomeric DNA and the proteins that bind telomeric 
DNA, the ends of chromosomes are recognized as broken chromo- 
somes and are fused together by DNA repair enzymes. Such breakage 
can create chromosome end-to-end fusions, which then create dicen- 
tric chromosomes that can be broken during the next cell division, 
creating new breaks and new fusions in an endless cycle known as the 
bridge-break-fusion cycle. 


26d. Evaluate: Germ-line cells divide many times, whereas many 
somatic cells are capable of a limited number of cell divisions (some are 
unable to divide at all). Solve: Telomerase is required to ensure com- 
plete chromosome replication in germ cells, ensuring that every mitosis 
produces two daughter cells with complete chromosomes. Telomerase 
is not required in somatic cells because they cannot divide enough 
times to result in loss of important DNA at chromosome ends. It is also 
thought that the lack of telomerase in somatic cells prevents indefinite 
cell division because loss of DNA at chromosome ends will activate 
DNA damage responses that stop division and lead to cell death. This 
response would help protect the organism from the spread of 
cancerous cells. 


28a. The reaction would have equal concentrations of deoxycytidine 
triphosphate, deoxythymidine triphosphate, and deoxyguanidine tri- 
phosphate. It would also have a mixture of deoxyadenosine triphosphate 
and dideoxyadenosine triphosphate. 


28b. Dideoxysequencing uses DNA synthesis to generate labeled DNA 
fragments of different lengths, which are then resolved by gel electro- 
phoresis or column chromatography. To visualize the products of DNA 
synthesis in traditional dideoxysequencing, relatively high levels of tem- 
plate were necessary. The use of PCR allows detectable levels of DNA 
synthesis from much lower levels of template DNA. 
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28c. Dideoxynucleotides contain a hydrogen group instead of a hydroxyl 
group on their 3’ carbon. When a dideoxynucleotide is incorporated into 
a growing DNA strand, there is no 3’ hydroxyl group present to allow 
phosphodiester bond formation with the next nucleotide to be added; 
therefore, no additional nucleotides are added to this DNA strand, and 
thus synthesis of this strand is terminated. 


32. Approximately 7500 origins of replication. Five minutes = 300 
seconds. Working bidirectionally, each origin generates (300 sec.) 

(40 nt.)(2) = 24,000 nucleotides per second, requiring 1.8 X 10°/2.4 x 104 
=0.75 X 104 origins. 


Chapter 8 


2. The three major modifications of mRNA are 5’ capping, intron 
splicing, and 3’ polyadenylation. The process of 5’ capping involves 
addition of a guanosine monophosphate by guanylyl transferase to the 
5' end of a pre-mRNA via a 5’-to-5’ triphosphate linkage and the sub- 
sequent methylation of the guanine and sometimes additional nucleo- 
tides on the pre-mRNA. Intron splicing involves the removal of introns 
from the pre-mRNA and the joining of adjacent exons by the spliceo- 
some. The process of 3’ polyadenylation involves the cleavage of the 
pre-mRNA downstream of the polyadenylation sequence by cleavage 
factors and addition of 20 to 200 adenine nucleotides by polyadenylate 
polymerase. 


6. DNA and RNA polymerases are similar in that both (1) catalyze 
phosphodiester bond formation to polymerize nucleotides into nucleic 
acids, (2) polymerize in a 5’-to-3' direction, and (3) are dependent on 

a DNA sequence template. DNA and RNA polymerases differ in that 
(1) RNA polymerase can initiate strand synthesis whereas DNA poly- 
merase can only extend an existing strand, (2) most DNA polymerases 
can proofread using a 3’-to-5’ exonuclease activity whereas RNA 
polymerases cannot, and (3) DNA polymerases use deoxyribonuclotide 
triphosphates as substrates whereas RNA polymerases use ribonucleo- 
tide triphosphates as substrates. 


8. The primary transcripts of bacterial and eukaryotic genes differ 

in that bacterial transcripts often contain more than one coding 
sequence (they are polycistronic) whereas eukaryotic transcripts do 
not. Polycistronic mRNAs allow for coordinate regulation of produc- 
tion of several proteins by controlling initiation of transcription of 

only one gene. Eukaryotes accomplish this by coordinate regulation of 
transcription of multiple genes by gene-specific transcription factors. 
Prokaryotic and eukaryotic primary transcripts differ in that eukaryotic 
transcripts are extensively modified before translation whereas prokary- 
otic transcripts are not. The modification of eukaryotic transcripts 
includes 5’ capping and 3’ polyadenylation, which generate structures 
that are critical for regulation of the initiation of translation and for 
controlling the half-life of the mRNA. Mechanisms controlling trans- 
lation initiation and mRNA half-life in bacteria do not involve these 
structures. The modification of eukaryotic transcripts also includes 
intron splicing, which is required for generating the complete open 
reading frame used in translation and allows for the generation of mul- 
tiple, different (but related) mRNAs from a single primary transcript. 
This last mechanism increases the number of different proteins that are 
coded by a genome without increasing the number of genes present. 


10. Recall that enhancers are DNA sequences that increase the level 
(rate) of transcription of genes in a position- and orientation-independent 
manner. Enhancers are binding sites for transcription factors that stimu- 
late transcription of one or more genes. Since the expression of the tran- 
scription factors is often specific to the cell type or tissue, enhancers 
often provide for a mechanism to stimulate transcription of genes in a 
manner specific to the cell type or tissue. Possible rationales for the lack 
of enhancers in bacteria include (1) the lack of differentiated cell types in 
most bacteria; (2) little to no intergenic space on bacterial chromosomes, 
which makes long-range-acting enhancer sequences unnecessary; and 
(3) bacterial operons make coordinate regulation of protein synthesis by 
enhancers unnecessary. 


18. Evaluate: This problem requires application of your understand- 
ing of the band shift assay to match each condition listed to the result 
shown on a gel. Deduce: The bands in lanes 2 and 4 have migrated 
the most rapidly; therefore, they correspond to naked DNA molecules. 
Lanes 1, 3, and 5 have migrated more slowly than naked DNA; there- 
fore, they are DNA + protein complexes. Lane 1 showed slightly higher 
mobility than lane 5, which showed higher mobility than lane 3. Higher 
mobility indicates less protein is bound to the DNA. Solve: Conditions 
cand d should result in naked DNA because c contains DNA only, and 
d contains DNA plus RNA pol II, which cannot bind to DNA in the 
absence of general transcription factors. Therefore, c and d correspond 
to lanes 2 and 4 (either lane is equally possible for either condition). 
Condition e contains the lowest number of transcription factors, fol- 
lowed by condition a, and condition b has the most transcription fac- 
tors. Thus, lane 1 corresponds to condition e, lane 5 corresponds to 
condition a, and lane 3 corresponds to condition b. 


20a. The organism transcribes as a wild type. 

20b. The organism transcribes slowly (i.e., is leaky). 
20c. The organism does not transcribe genes. 

20d. Temperature-sensitive mutant 


22a. Evaluate: Recall that AG is the consensus sequence found at 

the 3’ end of introns. Deduce and Solve: Since this mutation is in an 
intron but causes a defect in B-globin, it must affect splicing efficiency. 
The mutation replaces A with U, changing the 3’ splice site sequence 

AG to UG. This change is likely to affect the efficiency with which the 
spliceosome recognizes the end of intron 2 and leads to either inclusion 
of intron 2 in the mRNA—which results in an insertion or premature 
termination—or causes a change in the location of the 3’ splice junction, 
which leads to an insertion, deletion, or frameshift mutation. 


22b. This problem requires you to consider the structure of genes and 
identify important DNA sequences that are not part of exons. Non- 
exon-located mutations that could prevent gene function include mu- 
tations in the promoter or terminator sequences as well as in enhancer 
or silencer sequences. Mutations in the promoter would diminish or 
prevent transcription, which would reduce or eliminate the mRNA. 
Mutations in the terminator could prevent or alter termination, which 
would elongate the mRNA. Mutations in an enhancer would dimin- 
ish transcription, which would reduce mRNA abundance. Mutations 
in the silencer would enhance transcription, which would increase 
mRNA abundance. 


24a. First, the eukaryotic promoter is unlikely to be recognized by 
bacterial RNA polymerase holoenzyme. Second, the introns will not 
be removed from the pre-mRNA, which will result in production of an 
abnormal protein. Third, sequences required for efficient translation 
initiation in bacteria are not present. 


24b. First, I would make a cDNA copy of the gene. The cDNA is a 
DNA copy of the mRNA sequence, which lacks introns. Second, I would 
place the cDNA sequence downstream of a known bacterial promoter, 
which will ensure that the gene is transcribed. Third, I would modify 
the coding sequence upstream of the ATG start codon to contain a 
Shine-Dalgarno sequence, which is important for proper initiation of 
translation. Fourth, I would place an intrinsic or rho-dependent termi- 
nation sequence downstream of the cDNA to ensure efficient transcrip- 
tion termination. 


26a. Evaluate: This problem challenges you to interpret the results of 
a DNA footprinting experiment. Recall that DNA footprints correspond 
to sites where bound protein protects an end-labeled DNA fragment 
from digestion by the endonuclease, DNase. Deduce and Solve: The 
DNA-only and DNA + protein lanes differ because bands are missing 
from the DNA + protein lanes. This result indicates that the proteins 
are bound to the DNA. Since the proteins are transcription factors and 
RNA polymerase, which bind to promoters, it is reasonable to conclude 
that the DNA fragment contains a promoter sequence. 


26b. 200 base pairs 


26c. Evaluate: This problem challenges you to apply your under- 
standing of promoter structure and function to design experiments to 
test a DNA fragment for promoter function. Tip: The function of a 
promoter is to provide all the DNA sequences required for binding of 
transcription factors and RNA polymerase and for start of transcrip- 
tion. Deduce and Solve: One reasonable experiment would be to 
clone this DNA sequence upstream of the coding sequence for a protein 
whose expression is easy to assay and then introduce that chimeric con- 
struct into cells and assay for protein expression. If the result is negative, 
then the orientation of the fragment should be inverted to check that it 
was not inserted backward in the first attempt. Also, a known, control 
promoter should be used to confirm that the protein-coding sequence is 
correct and that the protein can be detected in the cells used. 


Chapter 9 


2a. Nirenberg and Matthaei developed an in vitro translation system 
that contained everything necessary for translation except for amino 
acids and mRNAs. Synthetic RNA composed of only U (poly-U) was 
added to 20 separate reactions, each containing a different radioactive 
amino acid as well as the other 19 nonradioactive amino acids. Only the 
reaction with radioactive phenylalanine produced radioactive protein. 
Since the RNA sequence poly-U contains only UUU codons, then UUU 
codes for phenylalanine. 


2b. Only reaction with radioactive proline produces radioactive pro- 
tein. Since poly-C contains only CCC codons, CCC codes for proline. 


2c. One type of polypeptide composed of alternating Arg and Glu 
amino acids was produced. 


2d. No detectable polypeptides would be produced. 


4. I. Preinitiation complex formation: the small ribosomal subunit and 
IF3 bind to the mRNA, the AUG start codon is identified by 16S 
rRNA base-pairing with the Shine-Dalgarno sequence, and the 
AUG codon is in the ribosomal P site. 

Il. Formation of the 30S preinitiation complex: fMet-tRNA™ bound 
to IF2—-GTP binds to start codon in the P site, and IF1 binds. 

III. Formation of the 70S initiation complex: 50S ribosomal subunit 
binds, IF2 cleaves GTP to GDP + phosphate, and IF1, IF2-GDP, 
and IF3 leave the complex. 

6. tRNAs that are charged with different amino acids have unique 

structural features that allow them to interact with their cognate amino- 

acyl tRNA synthetases. Unique features include the anticodon sequence 
as well as sequences and base modifications in the T-arm and D-arm. 


8a. 5'-CUA-3' and 5'-CUG-3’ 


8b. 5'-UUU-3' 
8c. 5'-GAG-3' 
8d. 5'-CAU-3' 


8e. 5’-AUC- 3’ and 5’- AUU- 3’ 
10. See table. 


Bacterial Ribosome Eukaryotic Ribosome 


RNA and protein 


Similarities 
Composition RNA and protein 
two (small and large) 


three (E, P, and A) 


Number of subunits two (small and large) 


three (E, P, and A) 


tRNA binding sites 


Differences 


Number and size three (16S, 23S, four (18S, 28S, 


of rRNAs and 5S) 5.85, 5S) 
Size of subunits 30S and 50S 40S and 60S 
Numbers of 21 in the small ~35 in the small 
proteins subunit and subunit and 
31 in the large 45 to 50 in the 
subunit large subunit 
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12a. The errors in the diagram are (1) ribosome is moving in the wrong 
direction along mRNA, (2) mRNA contains T’s, (3) amino terminal 
amino acid of the peptide is incorrect, (4) ribosomal subunit sizes are 
incorrect, (5) anticodon sequence of the tRNA in P site is incorrect, and 
(6) amino acids on tRNAs in P and A sites are incorrect. 


14, 31 
16. See table. 


Non- 
template 
DNA (5' to 3’) AAC ATA TGT GAA GGC GAG AAT GAA CGA 
Template 
(Geto Se TTE TAT VACA CTR CCC CTC ETTA CTE CCT 
mRNA 
(5' to 3') AAC AUA UGU GAA GGC GAG AAU GAA CGA e 
tRNA Lu 
(5' to 3’) UUG UAU ACA CUU CCG CUC UUA CUU GCU = 
: — 2i seis See set aces eee P 
Amino 3-letter Asn lle Cys Glu Gly Glu Asn Glu Arg zm 
acid x 
abbre- 
viations 1-letter N | Cc Ẹ G E N E R 


22. Soon after initiation of translation of an mRNA coding for a 
secretory protein, the amino terminus of the secretory protein is 
synthesized and is exposed on the surface of the ribosome. The 
amino terminus contains the signal sequence that marks this 


protein for cotranslational translocation into the endoplasmic 
reticulum (ER). The signal receptor particle (SRP) binds to the 
signal sequence and the ribosome and halts further translation. 
The SRP/ribosome/mRNA complex binds to the ER membrane— 
SRP binds to its receptor, and the ribosome binds to a protein 
translocation channel. SRP is released, translation resumes, and 
the growing polypeptide is extruded through the channel into the 
lumen of the ER. There, the signal sequence is cleaved by signal 
peptidase and the protein is glycosylated, folded with the help of 
chaperones, and packaged into transport vesicles destined for the 
Golgi apparatus. The carbohydrate on the protein is modified as 
the protein passes through the compartments of the Golgi, and 
the protein is packaged into vesicles destined for transport to the 
plasma membrane. Fusion of the transport vesicle membrane 
with the plasma membrane releases the secretory protein into the 
extracellular fluid. 

24a. 5'-UGUGUGUGUGUGUGUG . . .-3' 

24b. Cys-Val-Cys-Val-Cys-Val-Cys-Val . . . 

24c. The experiment resulted in production of a polypeptide composed 
of alternating Val and Cys amino acids, which is what was predicted for 
a nonoverlapping triplet code. 

24d. Two polypeptides, each composed of a single type of amino acid, 
would be produced if the code had been a doublet, nonoverlapping 
code. 

24e. Translation of the RNA using an overlapping doublet or triplet 
code gives the same result—a single type of polypeptide with two alter- 
nating types of amino acids. This result does not differ from that pre- 
dicted based on a nonoverlapping three-letter code but does differ from 
that predicted for a nonoverlapping two-letter code. 

26. 438 

32a. The start and stop codon are in bold print. 5'- CAPCCAAGCGUU 
ACAUGUAUGGAGAGAAUGAAACUGAGGCUUGCCACGUUUGUUAAGCACCU 
AUGCUACCGAAAAAAAAAAAAAAAAAAAAAAAA-3' 

32b. Met-Tyr-Gly-Glu-Asn-Glu-Thr-Glu-Ala-Cys-His-Val-Cys 
(MYGENETEACHVC) 
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34. The consensus sequence is CCCGCCGCCACCAUGG. Also see table. 


Position 12 Ah) 0) Se) SE 7 6 5) 


PercentA 23 26 25 23 19 23 17 18 25 


Percenti€ 35° 935 35 26 39 37 119° 39: 53) 2) 49) 55 [AUGI/16 


36 13 21 [AUG] 46 
207 lee OAV 
Cc C A C C AUGG 


PercentG 23 21 22 33 23 20 44 23 15 
Percent? 19 18 18 18 19 20 20 


Consensus C C C 


36. GCCACCAUGG 
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2. Evaluate: Recall that genetic diseases are those determined by 
genetic factors in the individual’s own genome. On the other hand, in- 
fectious diseases are caused by foreign agents that have invaded the in- 
dividual. Deduce: All genetic diseases are also molecular diseases since 
the genes involved encode molecules that are responsible for develop- 
ment of the disease. The only difference between the terms genetic and 
molecular is their point of emphasis: genetic emphasizes that a disease 
is not infectious, whereas molecular emphasizes that defects in one or 
more molecules are the ultimate cause of the disease. Sickle cell disease 
(SCD) is the first genetic disease for which the molecular mechanism 
was understood. SCD is seen in individuals homozygous for the f° allele, 
which encodes an altered B-globin polypeptide that leads to formation 
of an altered hemoglobin protein, polymerization in red blood cells 
under low oxygen conditions, cells with a sickle shape, blockage of 
capillaries in peripheral tissue, and ultimately pain and tissue damage. 


4. Evaluate: Recall that electrophoretic mobility is the rate at which a 
molecule migrates during electrophoresis. The electrophoretic mobility 
of a protein is determined by its charge, size, and shape, which in turn 
are primarily a result of its amino acid sequence. Deduce: Two pro- 
teins with different electrophoretic mobility typically have some differ- 
ence in amino acid sequence. If different alleles of a protein-coding gene 
code for polypeptides that differ in sequence, even if they differ at only 
one amino acid position, the charge or shape of the proteins can differ 
and result in differences in electrophoretic mobility. 


6. See “Template Strand (DNA)” column of table. 


=4 =3 =2 —1 [start]+4 


61 27 15 [AUG]23 


Coding Template 

B-globin Amino Codon Strand Strand 

Form Position Acid (mRNA) (DNA) (DNA) 

B^ (wildtype) 7 Glu 5'-GAG-3' 5'-GAG-3' 5'-CTC-3' 
sa E 7o Lys i 5'-AAG-3' 5'-AAG-3' Sy (ORANG) i 
‘San Jose 7 «Gly._~—sS'-GGG-3'5’-GeG-3" 5’-ccc-3" 
BA (wild ane) 58 Pro 5/-CCU-3’ 5'-CCT-3’ _5’-AGG-3" 
ZIGUINCHOR 58 Arg 5/-cGU-3" 5'-CGT-3! 5'-ACG- 3’ i 
b^ (wildtype) 145 Të O 5'-UAU-3' 5/-TAT-3’  5'-ATA-3' 
Bethesda 145 His 5'-CAU-3' 5'-CAT-3'  5'-ATG-3' 
Fort Gordon 145 Asp SCAU 3195) CATES S E ATCR 


8. Evaluate: Recall that mutations that increase the length of a poly- 
peptide either change the stop codon to a sense codon or shift the 


reading frame before, but close to, the stop codon. By comparing the 


sequences, we can determine that the wild-type and mutant sequences 
differ immediately after codon 144. Deduce: The continued change 
in sequence after codon 144 rules out base-substitution mutants, 


which would affect specific bases only, and points to a shift in the 
reading frame caused by the insertion of AG between codon 144 and 
codon 145. The two-nucleotide insertion shifts the reading frame to 


one that includes 13 sense codons (compared with two sense codons in 
the wild type) before a stop codon appears, which happens to be after 
codon 157. 


10. The primary molecular parameter affecting the electrophoretic 
mobility of DNA and mRNA is size (typically expressed as length). For 
proteins, size (the number of amino acids), charge, and shape affect 
electrophoretic mobility. 


12. Electrophoretic mobility is a measure of the size (length) of mRNA 
and DNA molecules; however, comparing them to each other is, in 
most cases, like comparing apples to oranges. The length of an mRNA 
is an inherent, biological property of the gene that encodes the mRNA 
and is not dependent on the method used to prepare the mRNA for gel 
electrophoresis. The size of the DNA molecule containing the gene is 
entirely dependent on the experimental method used to prepare it for 
electrophoresis. For example, the size of a DNA restriction fragment 
containing a gene is determined by which restriction enzyme is used. 
DNA fragment sizes will likely differ for different restriction enzymes 
and are unlikely to be the same length as the mRNA. The exception to 
this rule is the comparison of a cDNA molecule to an mRNA molecule. 
A cDNA molecule is a double-stranded DNA copy of the mRNA. If the 
DNA copy is perfect, then cDNA length in base pairs should equal the 
mRNA length in nucleotides. 


14. Evaluate: The frequency of $° will be determined by the evolution- 
ary forces acting on the phenotypes of 6°85, 8°84, and 246^ individuals. 
The f° allele encodes an abnormal B-globin polypeptide that forms 
abnormal hemoglobin molecules that reduce the life span of red blood 
cells. This is most severe in 658° homozygotes and is detectable but less 
severe in 6°84 heterozygotes. The reduced life span of the red blood cells 
interrupts the life cycle of the malaria protists and results in a certain 
level of resistance to malaria. Solve: The severity of the anemia in B°B* 
individuals decreases their reproductive fitness relative to 6484, which 
selects against the f° allele. The less severe effect in 6°“ heterozygotes 
does not affect fitness, except in populations where malaria is endemic, 
where it increases reproductive fitness relative to 644. Based on this, a 
reasonable hypothesis would be that malaria is more prevalent in west- 
ern Africa than southern Africa. 


16a. Yes, the fetus has a 1/4 chance of having SCD. 


16b. The fetus will develop SCD because it is homozygous for the £5 
allele. 


18. Some restriction enzymes make a staggered double-stranded 
DNA cut at their recognition sequence, cutting the two DNA strands 
at different positions. This leaves single-stranded DNA ends, called 
“sticky ends” because they can form base pairs with the single- 
stranded ends of other restriction fragments and cause the fragments 
to stick together. Some restriction enzymes make a clean double- 
stranded DNA cut at their recognition sequence, cutting both strands 
at the same site. This leaves ends that are blunt in the sense that all of 
their nucleotides are base-paired. Blunt ends, therefore, are not sticky. 


20. Evaluate: Restriction enzymes break two phosphodiester 

bonds between the same nucleotides in the same sequence on both 

strands of DNA. Two sites ofaction suggest two active sites in the en- 

zyme. Solve: Restriction enzymes typically bind as dimers with each 

monomer binding to the recognition sequence on opposite strands and 
f I 

cutting that sequence in the same location. Thus, IR is bound 

by a BamHI dimer, where each monomer binds to and positions its ac- 

tive site to break the phosphodiester bond between the G’s. 


24a. One type of mutation would be a deletion of 3.5 kb within the 
gene but not including the region corresponding to the probe. A second 
type of mutation would be a point mutation that creates a new restric- 
tion site 4.0 kb in from the right end of the map. 


24b. For the deletion mutant, I would expect either a smaller mRNA 
(3500 nucleotides shorter than the wild type) or less mRNA if the mu- 
tant mRNA is unstable. For the point mutation, I would not expect the 
mRNA to be different in length (unless the mutation alters pre-mRNA 
splicing), although the abundance of the mRNA may change if the mu- 
tation changes the stability of the mRNA. 


26. Evaluate: Linear DNA molecules can be considered to 
move through an agarose gel matrix as extended (rod-like) 


molecules that have the same charge-to-mass ratio regardless of 
length. Deduce: Since their shape and charge-to-mass ratio are the 
same, the only factor affecting their relative electrophoretic mobility is 
their length. The gel will slow the longer molecule to a greater extent 
than the shorter molecule. The same rationale applies to mRNAs, as- 
suming electrophoresis under conditions where mRNAs are linear. 


28. Evaluate: Recall that restriction endonucleases are components 
of bacterial defense systems that protect against invasion by foreign 
DNA. Deduce: You can infer from the information provided that the 
restriction enzyme is able to cut foreign DNA, for example, DNA from 
an invading bacteriophage’s genome. DNA from a bacteriophage is 
digested by the restriction enzyme, which decreases the likelihood that 
the bacteriophage will successfully infect the bacterium. 


30. Yes, the father could be 31 but not d2 because puppy P3 has a 
band that is not in the mother or in ¢2 but is in 31. 


Chapter 11 


2. Recall that the term haploid refers to one copy of genetic infor- 
mation. The terms do not conflict because a bacterium is haploid 
regardless of whether its genes are contained in one or more than one 
chromosome. 


4. Approximately 2.9 X 10°/(146 + 50) = 1.48 X 10” nucleosomes. 


6. Recall that the G-banding pattern of light and dark bands of chro- 
mosomes is characteristic for each chromosome. These distinctive band 
patterns allow a cytologist to unambiguously identify each chromosome 
in a human karyotype. This pattern allowed for the development of cyto- 
genetics, which is the genetic analysis of an individual that is performed 
by microscopy. Genetic abnormalities associated with alterations in 
chromosome number or structure can be detected by cytogenetic 
analysis, allowing for the rapid diagnosis of some genetic diseases. 


8. Interphase chromosomes will be less condensed than metaphase 
chromosomes. Chromosomes are difficult to resolve by microscopy in 
interphase, whereas chromosomes are easily resolved in metaphase due 
to their high degree of condensation. 


10. Evaluate: This problem asks you to consider DNA sequence ele- 
ments that are essential, evolutionarily conserved components of bac- 
terial or eukaryotic chromosomes. Deduce: Chromosomes must be 
replicated and passed on to progeny cells. Bacterial chromosomes must 
have all the genes essential for bacterial life, an origin of replication 

for initiation of replication, and a site for attachment to the bacterial 
membrane to ensure segregation of daughter chromosomes of each cell 
at cell division. Eukaryotic chromosomes must contain a centromere, 

a telomere at each end, and multiple origins of replication. Eukaryotic 
chromosomes must also contain genes essential for eukaryotic life, al- 
though these genes can be dispersed among the chromosomes. Natural 
selection will select against chromosomes that lack sequences required 
for replication or segregation and the cells that contain them because 
they will be less fit than those that contain these sequences. The same 
argument applies to chromosomes that lack essential genes. 


12. Bacterial chromosomes are typically circular and are attached to 

the bacterial cell membrane. Circular chromosomes do not require telo- 
meres; therefore, telomeres are not present on bacterial chromosomes, 
and there would be no evolutionary advantage for a chromosome to have 
them. Bacteria do not have microtubules; therefore, bacterial chromo- 
somes do not need a centromere to facilitate assembly of a microtubule 
binding site. The membrane attachment site of a bacterial chromosome 
serves to promote segregation of daughter bacterial chromosomes to 
opposite sides of a dividing cell, which is analogous to the function of 
eukaryotic chromosome centromeres. 


14. Telomeres are composed of repetitive DNA in which the repeated 
sequence is a simple sequence (for example, TAAGGC repeated many 
times). Directly next to the telomere are telomere-associated sequences, 
which are also composed of repetitive DNA, but the repeated sequences 
are more complex and may include genes. Directly next to the telomere- 
associated sequences are “normal” chromosome sequences that contain 
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genes and intergenic regions. Note that there is variation in this se- 
quence organization among chromosomes in the same organism and 
between chromosomes in different organisms. 


18a. Evaluate: Recall that nucleosomes are spaced 200 bp apart and 
that after S phase there are two copies of each chromosome. The hap- 
loid genome size of Arabidopsis is 10° bp. Deduce: The number of 


nucleosomes per genome in a diploid is given by 200 xX2=5 X 10°. 


There are twice as many nucleosomes after completion of S phase. 
108 ‘ 

Solve: 2| —~ X 2 | = 10°. 
200 


18b. The histone proteins that were part of nucleosomes before 

S phase are recycled and used to form the new nucleosomes during 

S phase. The additional histone proteins required to double the nucleo- 
some number are newly synthesized. Therefore, half of the histone pro- 
tein present on chromosomes after S phase is newly synthesized, and 
the other half was already present. 


22a. Recall that the E. coli chromosome is 1000 times longer than an 

E. coli cell, yet it fits into a small region of the cell called the nucleoid. 

As with other bacterial chromosomes, the Methanococcus jannaschii 
chromosome is compacted by supercoiling and the binding of proteins 
that fold the chromosome. The combination of supercoiling and folding 
(condensation) reduces the volume occupied by the chromosome, allow- 
ing it to fit into the region of the bacterial cell known as the nucleoid. 
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22b. The chromosome is likely compacted in two steps. First, it is 
bound by nucleoid-associated proteins and SMC proteins, which loop 
segments of the chromosome and condense them. Second, the DNA is 
supercoiled by the action of topoisomerases. 


22c. Recall that most bacterial chromosomes and plasmids are nega- 
tively supercoiled. Negative supercoiling folds chromosomal DNA and 
promotes unwinding of regions of the chromosome. This promotes 
access to ssDNA for enzymes such as DNA polymerase and RNA poly- 
merase. Supercoiling of the M. jannaschii chromosome folds the chro- 
mosome and promotes the function of DNA and RNA polymerases. 


24. Evaluate: Recall that histones interact with DNA in a sequence- 
independent manner to form nucleosome core particles that are 
conserved in structure and function from yeast to man. Also recall 

that evolutionary change of a protein’s sequence is under “functional 
constraint,’ which limits or prevents changes to sequences that perform 
essential functions. Deduce and Solve: Histone H4 is one of the core 
histones and is part of the nucleosome core particle. Most of the amino 
acid residues of H4 interact with either DNA or other histone proteins; 
therefore, most of the H4 amino acid sequence is under functional con- 
straint (any change to the H4 amino acid sequence will likely be delete- 
rious and therefore will be selected against by natural selection). Thus, 
little change occurs in the sequence of H4 over many millions of years 
that separate pea plants and cows from their common ancestor. 


26. Evaluate: Recall that all four histones are synthesized at the begin- 
ning of S phase and that, after completion of S phase, half the histone 
proteins present will be new and half will be left over from previous cell 
cycles. Tip: *°S-containing methionine can be used in a pulse-chase 
experiment to radioactively mark all the histone proteins synthesized 
during one S phase. Deduce and Solve: Start with cells that contain 
unlabeled methionine and are in G; phase. Place them in medium 
containing °°S-methionine and allow them to complete S phase. Take a 
sample of cells for analysis and then transfer the remaining cells to me- 
dium containing unlabeled methionine and allow them to divide and go 
through multiple rounds of S phase and cell division, collecting samples 
after each S phase. Analyze the nucleosomes of each sample by micros- 
copy and autoradiography. If nucleosomes are a mixture of old and new 
histones, then most or all nucleosomes will be radiolabeled after the 
first S phase, about half will be radiolabeled after the second round, and 
1/4 after the third round. If all nucleosomes contain either old or new 
histones, then about half of the nucleosomes will be radiolabeled after 
the first S phase, 1/4 after the second, and 1/8 after the third. 
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28a. Evaluate: Recall that DNase I degrades DNA that is not pro- 
tected by protein. “Large amounts” of DNase I would be expected to 
digest DNA completely, leaving only those sequences bound by nu- 
cleosomes protected. Tip: Review the DNase footprinting technique 
introduced in Chapter 9. Deduce and Solve: The only DNA sequences 
protected from DNase I would be those in the nucleosome core particle. 
There would be one band, approximately 145 bp in size. 


28b. Evaluate, Deduce, and Solve: Each band would represent the 
DNA bound by histones in the nucleosome core particle. All non- 
nucleosomal DNA and all linker DNA between nucleosomes would be 
digested. 


28c. Evaluate: Recall that the 10-nm fiber model for chromatin 

states that the 10-nm fiber is a linear array of nucleosomes and that 

the chromatin is not folded into higher order structures. Deduce and 
Solve: The only protection against digestion by DNase I was due to his- 
tones bond to DNA in nucleosomes, indicating that no other proteins 
and no higher order packing of the chromatin at this region occurred. 
Higher order structures would have generated fragments of DNA that 
are larger than the nucleosome core. 


Chapter 12 


2. Evaluate: Recall that 5-BrdU is a base analog and that nitrous acid 
is a deaminating agent. Deduce and Solve: 5-BrdU is a thymidine 
analog that can base-pair like thymine or like cytosine. If 5-BrdU is 
incorporated in place of thymidine and then base-pairs like cytosine 
in the subsequent round of replication, it causes an A-T to G-C transi- 
tion. If 5-BrdU is incorporated into DNA in place of cytidine and then 
base-pairs like thymine in the following round of replication, it causes 
a G-C to A-T transition. Nitrous acid converts cytosine to uracil, 
which will base-pair with adenine in the next round of replication and 
cause a C-G to T-A transition. Nitrous acid can also convert adenine 
to hypoxanthine, which will base-pair with cytosine in the next round 
of replication and cause an A-T to G-C transition. 


4a. The mutation is a frameshift mutation (insertion or deletion). 
4b. TCT/G-TAC-ATA-TGC-GAG-ACA-AGN 


8. Evaluate: This problem concerns the relationship between the 
coding sequence of a gene and the function of the encoded protein. 
Recall that the effect of a single-nucleotide substitution depends on 
whether the substitution changes the meaning of the codon and, if 

so, whether that change has a significant impact on the structure of 
the protein. Deduce: Nucleotide substitutions can result in silent, 
missense, or non-sense mutations. Silent mutations do not change 

the amino acid sequence of the protein and therefore have no effect 

on protein function. Missense mutations change one amino acid in a 
protein. Solve: Therefore, the effect on the function of the protein 
depends on the importance of the amino acid that was replaced and the 
functional similarity (or lack thereof) of the R-group on the substituted 
amino to that of the wild-type amino acid. 


10. Evaluate: This problem tests your understanding of the effect 

of spontaneous mutations on gene function. Recall that spontaneous 
mutations are typically nucleotide substitutions and, if detected, are not 
silent. Deduce and Solve: (1) Recessive mutations are typically loss-of- 
function mutations. Wild-type gene sequences have been selected dur- 
ing evolution for optimum function; therefore, any change (mutation) to 
that sequence is likely to replace a nucleotide maintained by natural se- 
lection with one that reduces the function of the gene. (2) Forward mu- 
tations include all mutations in a gene that convert it from wild type to 
mutant, whereas reverse mutations are only those that precisely reverse 
a specific mutation to wild type. Thus, the number of possible nucleo- 
tide changes corresponding to a forward mutation is much greater than 
those that reverse a given mutation, making forward mutations far more 
frequent than reversion. 


12a. Evaluate: Consider the close evolutionary relationship between 
mice and humans and the experimental utility of the mouse as a model 
research organism. Deduce and Solve: The mouse (Mus musculus) 

is a widely used model organism for genetic analysis of mammalian 


development and physiology, specifically in relation to human disease, 
because many of these processes in mice and humans are evolutionarily 
conserved. The other advantage is that with mice, researchers can per- 
form experimental manipulations that are not possible when studying 
humans or even nonhuman primates. 


12b. Evaluate: Consider the differences between mice and hu- 

mans. Deduce: Although the mouse (Mus musculus) is a mammal, 
there are many developmental, behavioral, and physiological differences 
between mice and humans. In addition, not every human gene has a 
homolog in the mouse genome. Solve: Therefore, in cases where the 
physiology or genetics of mice and humans differ, mutations in a mouse 
homolog to a human disease gene may not provide useful information on 
the human disease process. 


14. 1 mutation per 322,182 gametes 


16a. 2 births with retinoblastoma, 8 births with achondroplasia, and 
22 births with neurofibromatosis 


16b. Two reasons that could explain why the neurofibromatosis 
(NF1) mutation rate is higher than retinoblastoma (RB1) mutation 
rate are (1) the NF1 gene is larger than RBJ, and (2) a higher 
percentage of mutations within NF1 affect NF1 function as 
compared to RBI. 


18. Evaluate: This problem tests your knowledge of the function of 
the E. coli RecA gene. Deduce and Solve: The RecA gene is required 
for recombination repair of DNA damage. E. coli with a null mutation 
in RecA would lack RecA function and would be deficient in recombi- 
nation repair. Recombination repair is used to fill in a single-stranded 
DNA gap created by the lack of replication of a region due to DNA 
damage (for example, a UV photoproduct). Several steps in recombi- 
nation repair are catalyzed by RecA, including two-stranded invasion 
events and a single-stranded DNA cleavage event. 


20. Evaluate: This problem tests your understanding of mutation and 
homologous recombination. Deduce and Solve: Mutation is defined 
as “a change in DNA sequence.’ Gene conversion resulting from re- 
combination changes the DNA sequence of a chromosome; therefore, 
it fits within the definition of mutation. Recombination also combines 
chromosome sequences in new ways, creating new stretches of DNA 
sequence; therefore, recombination also is arguably a form of mutation. 
However, mutation is typically reserved to describe DNA sequence 
changes that are due to processes other than homologous recombina- 
tion and gene conversion. 


22. Yes; heteroduplex DNA is always created during homologous 
recombination. 


24. Evaluate: Recall that gene conversion is rare and results in 

the conversion of the genotype of one gamete out of four during 
meiosis. Deduce and Solve: The four products of meiosis in multicel- 
lular eukaryotes are not identifiable as such and instead are pooled with 
the four products or with those of hundreds if not thousands of other 
meioses. Furthermore, these gametes are detected only by mating indi- 
viduals and observing the phenotype of the resulting progeny. The high 
numbers of gametes produced and the random sampling of gametes 
during zygote formation make statistically significant identification of 
aberrant 3:1 segregation impossible. 


34a. Evaluate: This problem tests your ability to analyze yeast mu- 
tant phenotypes. Deduce and Solve: Prototrophic yeast are able to 
grow on minimal medium, whereas auxotrophic yeast cannot. Yeast 
in colonies 4 and 5 grew on complete medium at 25°C but not on 
minimal medium at either temperature; therefore, colonies 4 and 5 
correspond to auxotrophic yeast mutants. The remainder can grow 
on minimal medium at 25°C and therefore are prototrophic yeast. 


34b. Evaluate: This problem tests your ability to analyze yeast mutant 
phenotypes. Deduce and Solve: The yeast in colonies 1 and 2 can 
grow on all media at 25°C but not on any of the media at 37°C. These 
yeast mutants are temperature sensitive for growth. The yeast in colony 
5 cannot grow on minimal medium at either temperature but can grow 
on minimal plus adenine at both temperatures. This yeast mutant is an 
adenine auxotroph. 


34c. Evaluate: This problem tests your ability to analyze yeast mu- 
tant phenotypes. Deduce and Solve: The yeast in colony 4 have two 
separate mutant phenotypes. This mutant cannot grow on complete 
medium at 37°C and therefore has a temperature-sensitive growth phe- 
notype. This mutant also cannot grow on minimal medium at 25°C but 
can grow on minimal plus adenine at 25°C; therefore, it is also an ade- 
nine auxotroph. The fact that the yeast mutant corresponding to colony 
4 has two different mutant phenotypes is an indication that this mutant 
carries two separate mutations, one affecting growth independently of 
adenine metabolism and a second affecting only adenine metabolism. 


40a. The ascus shows 6:2 segregation of brp* : brp”. 


40b. Evaluate: This problem tests your understanding of the process of 
gene conversion and its relationship to recombination during meiosis. 
Deduce and Solve: The aberrant ratio of 6 brp” : 2 brp” indicates that gene 
conversion in the brp locus occurred during the meiosis producing this 
ascus. Gene conversion is associated with recombination, which indicates 
that a recombination event was initiated in the region between ala and cty 
during this meiosis. 

40c. Evaluate: This problem tests your understanding of the geno- 
type of asci that show evidence of gene conversion. Deduce and 
Solve: Recombination is accompanied by formation of heteroduplex 
DNA between the two Holliday junctions that form. A region of the 
heteroduplex can contain mismatched base pairs if that region in the 
homolog is not identical. In this case, the heteroduplex included the 
region containing the sequence difference that distinguishes bry” from 
bry* and therefore contained mismatched bases. These mismatches 
are repaired by mismatch repair, but the direction of the repair is not 
controlled, such that a homolog that should have bry” could be re- 
paired to contain bry” information. The heteroduplex occupies only 
a portion of the region undergoing recombination, which in this case 
did not include the ala or cty loci. Therefore, there was no gene 
conversion at ala or cty, and those alleles segregated normally—4:4. 


Chapter 13 


4. b. interstitial deletion; c. duplication; d. terminal deletion; e. trisomy; 
f. reciprocal balanced translocation; g. paracentric inversion; h. mono- 
somy; i. polyploidy 

6. Since P elements can cause mutations if they insert into genes, limit- 
ing their number reduces the likelihood of a new mutation. 


8. Mutation of a single ISI sequence in E. coli will prevent insertion of 
transposable DNA into the element. Copies of IS1 that are not mutated 
can undergo transposition. 


10. Yellow-bodied females can be produced from this cross if nondis- 
junction in the female parent produces an egg with two X chromosomes 
and the egg is fertilized by a sperm containing the Y chromosome. 

The XXY zygote will develop as female and will be homozygous for the 
recessive yellow-body allele. 


16c. Two PCR marker combinations are possible: 290, 310, 340; and 
290, 340, 380. 


16d. Four PCR marker combinations are possible: 290, 310; 290, 380; 
310, 340; and 340, 380. 


20. The human and orangutan chromosome have identical band- 
ing patterns along their entire lengths, and all four species have the 
same chromosome 5 banding pattern from band 5q14.1 to the q arm 
telomere. In comparison to human chromosome 5, the chimpanzee 
chromosome has undergone a pericentric inversion with breakpoints 
at approximately 5p13.2 and 5q13.3. The gorilla chromosome dif- 
fers from the human chromosome from 5q13.3 to the telomere of 
5p. It may have undergone a balanced translocation with another 
chromosome. 


22. Mosaicism refers to the condition in which the body has cells with 
more than one karyotype. The range of phenotypic effects observed in 
these sex chromosome mosaics is dependent on the relative percentages 
of cells with each karyotype. Greater percentages of XO cells correspond 
to phenotypes that are more similar to those with Turner syndrome. 
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26. In this analysis, both recombination frequencies are much lower 
than expected: Recombination between peach and oblate is 2.1% instead 
of 17%, and recombination between dwarf and peach is 3.1% instead of 
12%. This result is consistent with inversion of the chromosome region 
containing peach and the presence of heterozygosity for the inversion 

in the trihybrid line. Inversion heterozygosity is suppressing the appear- 
ance of most of the crossover chromosomes in this cross. 


28a. 3.5 kb 
28b. 7.0 kb 


28c. The insertions of intron sequences into the P element and into the 
copia element are likely to disrupt gene expression from each element. 


Chapter 14 


2a. A DNA sequence that binds a regulatory protein, such as the lac 
operator sequence. 


2b. A regulatory protein that binds DNA, such as the lac repressor 
protein. 


2c. A compound that induces or activates transcription, such as lactose. 
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2d. A compound that interacts with another protein or compound to 
form an active repressor, such as the trp corepressor. 


2e. A DNA sequence that binds RNA polymerase and regulates tran- 
scription, such as the lac promoter. 


2f. A process of transcription regulation through which the binding of 
regulatory proteins to DNA activates transcription, such as the CAP 
binding site of the lac promoter. 


2g. A process by which the stereochemistry of a protein is altered to 
change its interaction capabilities, such as the lac repressor protein. 


2h. A process of transcriptional regulation through which binding of 
regulatory proteins to DNA blocks transcription, such as lac repressor 
protein binding to lac O. 


2i. A mechanism of transcriptional regulation in which transcription 
level is modified (attenuated) to meet environmental requirements, 
such as trp operon attenuation. 


4a. Similarities: Both have promoter and operator regulatory se- 
quences. Differences: Inducible operons bind repressor protein to 
block transcription and may use positive control to help activate tran- 
scription. Inducible operons require an inducer substance to activate 
transcription. Repressible operons use a corepressor plus the pathway 
end product to repress transcription. Repressible operons often utilize 
attenuation. 


4b. Similarities: Both types of regulatory systems utilize allostery 

in regulating transcription. Differences: The mechanism and con- 
sequences of allostery differ. In lac operon regulation, the repressor 
protein binds the operator, but allosteric change caused by allolactose 
prevents binding. In trp operon regulation, the corepressor protein can- 
not bind the operator until its allosteric shape is changed by binding to 
tryptophan. 


4c. Similarities: Both types of operons contain multiple genes that 
share a single promoter and a single operator sequence. Differences: 
Repressible operons often use attenuation and contain a transcribed 
leader sequence that participates in determining structural gene tran- 
scription. This mechanism is not found in inducible operons. 


6. Attenuation does not involve allosteric changes. Attenuation is the 
result of transcription of a leader sequence that undergoes translation. 
Coupling of transcription and translation dictates whether transcription 
continues past the leader sequence and into the structural genes. 


8. The CAP binding site is part of the lac promoter and is located at 
approximately —60. It binds the CAP—cAMP complex and opens DNA 
slightly to allow efficient RNA polymerase binding at the lac promoter. 


10. A Cap mutation would alter the CAP binding site sequence and 
render it unrecognizable by CAP—cAMP. The required positive regula- 
tion of transcription would not occur, and /ac operon transcription 
would be minimal. The strain would be lac . 
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12. Transcription occurs under both conditions because allolactose, 
the inducer, is present. Transcription is higher in the absence of glucose 
because CAP-cAMP levels, which stimulate transcription, are higher. 


14. Antisense RNAs are single-stranded RNAs that are complementary 
to a portion of specific mRNA transcripts. Bound to their mRNA tar- 
gets, antisense RNAs can either block translation or lead to the destruc- 
tion of mRNA. Blocking translation prevents the production of proteins 
that might initiate unnecessary or harmful actions. 


16a. Blocks all transcription 

16b. Produces constitutive transcription 

16c. Blocks all transcription (this is an IS mutation) 

16d. Produces constitutive transcription (this is an J” mutation) 
16e. Only minimal transcription will occur. 

18. See table. 


Genotype B-Galactosidase Permease Phenotype 
“No No 
Lactose Lactose Lactose Lactose 
Example: /* Pt O" Z+ y* + - + - lact 
a. SP*OtZ+ty*/ m P*O* ZY+ F : lac 
areae eer re n n e dees a 
kepo yir ozy — + - Tee 
Wie or eos a e a lac" j 
e P POZ Yaw i RO Zy cts + — — lac” 
f Prot Y+ PP*O z+ y Te ae 7 
g. Bptotz-yt/itptoczty~ + + - - lac 


20a. I P* O* Z-Y*/1* P* OF Z* Y* will have inducible transcription 
of both genes. It P* O° Z- Y+ / I* P* O* Z* Y* will have constitutive 
transcription of /acY and inducible transcription of lacZ. cap* It P~ O* 
Z* Y*/I* P" O" Z* Y" will have inducible transcription of both genes. 
cap IS P* O* Z* Y*/I* P* OF Z* Y* will be noninducible. 

20b. The first three partial diploids will be able to grow on a lactose 
medium, but the final partial diploid (I$ P+ OF Z+ Y* /I* Pt OF Z* Y*) 
will not. 

22a. No; permease is not produced. 


22b. Transcription of lacZ is inducible from the cap* I P* O* Z* Y~ 
chromosome. Only minimal transcription occurs from the other chromo- 
some, so permease is noninducible. 


22c. The lacl gene has its own promoter and is not affected by lac 
operon regulation of gene mutations. The cap mutation minimizes 
transcription, but repressor protein produced from this chromosome 
is trans-active and binds O* on the other chromosome to induce lacZ 
expression. 


24. Gene Z is the enzyme, gene W is the repressor, and G is the operator. 


26a. This would prevent the lac repressor from binding to the operator, 
which would cause constitutive transcription of the /ac operon. 

26b. This would prevent the repressor from binding to Og; allow cro 
binding to Op3 and Op3, prevent transcription of Pry, allow transcrip- 
tion of Pr, promoting the lytic life cycle. 

26c. This would prevent cro from binding to Ogg, allow repressor bind- 
ing to Ogz and Op;, prevent transcription of Pe, allow transcription of 
Pry promoting the lysogenic life cycle. 

28a. The mutant is incapable of establishing lysogeny. Lytic gene 
transcription from the Og sites cannot be repressed. 

28b. The mutant is incapable of establishing lysogeny. 

28c. The mutant will be unable to carry out lysis. No transcription 
activation occurs from O; or Op. 

28d. The mutant will be unable to establish lysogeny. The mutant can- 
not undertake site-specific recombination to integrate the lysogen. 

28e. The mutant will be unable to carry out lysis due to the cro muta- 
tion, and it will be unable to establish lysogeny due to the cH mutation. 


28f. The mutant will be unable to carry out lysogeny. The expression of 
the late gene that takes place through the antiterminator activity of N at 
tz, tri, and tp will not occur. 


30a. The same band in both lanes 

30b. No band in lane 1; band in lane 2 

30c. No band in either lane 

30d. The same band in both lanes 

30e. No band in lane 1 anda band in lane 2 
30f. No band in either lane 

32. See table. 


LacZ mRNA Lac 
Genotype Synthesis Phenotype 
at mPa OZ VPRO Zanes inducible lact 
b. PtOczty*/I*PtOtzZ-¥* constitutive lac 
“c SPtotztyt sit ptotzty* uninducible f= 
df Ptotz-ytyiP-otzty* —_ uninducible i 
et Ptotzty-sitptotzty- uninducible lac 7 


Chapter 15 


2a. UAS elements are found in the yeast genome, where they operate 
as enhancer-like regulatory sequences. Gal4 protein binds yeast UAS 
elements to activate transcription of galactose utilization genes. 


2b. Insulator sequences shield genes from enhancer effects. The mech- 
anism of action may be through the formation of specific DNA loops 
that protect particular genes from enhancers. 


2c. Silencer sequences prevent transcription of particular genes. The 
mechanism of action may be through competitive protein binding at 
silencer sequences that overlap with enhancer sequences. The yeast 
Mig1 and Tup1 proteins bind a silencer sequence during glycolysis to 
prevent transcription of galactose utilization genes. 


2d. The protein complexes that assemble at enhancers to facilitate 
transcription are known as enhanceosomes. The enhanceosome com- 
plex known as Mediator assembles at yeast enhancers. It contacts 
promoter-bound proteins to activate transcription. 


2e. RNA interference describes the posttranscriptional regulation of 
mRNAs by regulatory RNA molecules. RNAi is a prominent feature of 
the regulation of gene expression in most eukaryotic genomes. 


4. Acetylation occurs when acetyl groups are added to amino acids of 
the histone protein by acetylase enzymes. These acetylation events are 
most often associated with transcription activation, though there are 
many exceptions. 


6. mRNAs are transcribed from DNA and carry the information to be 
translated into protein. rRNAs provide both scaffold and enzymatic 
activities to ribosomes. tRNAs binds an amino acid at their 3’ ends and 
recognize codons in mRNA via their anticodons, thus translating nucleic 
acid sequence information into protein sequence information. miRNAs 
and siRNAs act to regulate gene expression via RISC, either to slice or 
inhibit translation of mRNA targets, or to facilitate recruitment of chro- 
matin modifying enzymes to chromosomal loci. Some IncRNAs act as 
scaffolds to bring chromatin regulatory proteins to chromosomal loci. 


8. Several factors can be cited, including the following: (1) the presence 
of a nucleus in eukaryotic cells, (2) the chromatin structure of eukary- 
otic genomes, (3) multicellularity that is frequent in eukaryotes, and 

(4) differential gene expression among different types of eukaryotic cells. 


10. Heterochromatin regions will decondense for DNA replication 
during the S phase of the cell cycle to allow replisome access. 


12. Chromatin is classified into euchromatin and heterochroma- 

tin based on the chemical modifications on the histone proteins. 
Euchromatin is characterized by H3K9-acetylation and is transcription- 
ally active. Heterochromatin is transcriptioanlly inactive and may be 
either constitutive, in which case it is marked with H3K9-methylation, 
or facultative, in which case it is marked with H3K27 methylation. 


Facultative heteraochromatin can be converted to euchromatin, and 
vice versa, by chromatin modification. 


14. One potential role of IncRNAs in gene regulation is to act as scaf- 
folds to recruit chromatin modifying emzymes to the chromatin. An 
example is Xist, which acts to recruit the polycomb complex to the X 
chromosome that is destined to be inactivated. 


16a. The enhancer is most likely in the region at the left-hand side that 
is present in mutant E and mutant F. 


16b. The promoter region is most likely in the region at the right-hand 
side that is present in mutant E and mutant F. 


16c. Mutant E likely contains all or most of the enhancer and promoter 
sequences, but DNA between the sequences is missing. This leads to diffi- 
culty forming the correct DNA loop and appears to interfere with efficient 
transcription initiation. In contrast, mutant F contains additional DNA 
sequence, particularly between the enhancer and the promoter. Its higher 
level of transcription indicates greater efficiency in transcription initiation. 


18a. Mutant A has an enhancer mutation. This deletion is located 
well upstream of the start of transcription and substantially reduces 
transcription. 


18b. Mutant B affects a silencer sequence. This deletion results in a 
substantial increase in the level of transcription. 


18c. Both mutants C and D are promoter mutations. Their location im- 
mediately upstream of the transcription start and the reduced levels of 
transcription from these mutants are consistent with promoter mutations. 


20a. Enhancer and silencer sequences are each detected in this analy- 

sis. The enhancer sequence is located in the deleted region that is com- 
mon to mutant E and mutant F. The silencer sequence is located in the 
deletion region unique to mutant E. 


20b. The deletion in mutant D deletes the ME1 promoter sequence. 


20c. It seems likely that regulation of ME1 is developmentally con- 
trolled by the combined activity of an enhancer and a promoter that ac- 
tivate transcription and a silencer sequence that represses transcription 
at particular times during development. 


Chapter 16 


2. Difference suggests posttranscriptional regulation. One possibility 
is that the protein is stable in only one cell type and is rapidly degraded 
in other cell types. Another possibility is that the mRNA is translated in 
only one cell type. 


4. E. coli: 4.64 X 10° bp / 100 minutes = 4.64 X 10* bp / minute 
Arabidopsis: 130 X 10° bp / 600 cM = 2.17 X 10° bp / cM 
Saccharomyces: 12X 10° bp / 4500 cM = 2.67 X 10° bp / cM 
C. elegans: 100 X 10° bp / 300 cM = 3.33 X 10° bp / cM 
Drosophila: 180 X 10° bp / 275 cM = 6.55 X 10° bp / cM 


Danio rerio: 2000 x 10° bp / 3000 cM = 6.67 X 10° bp / cM 
Mus: 3000 X 10° bp / 1400 cM = 2.14 X 10° bp / cM 
3000 X 10° bp / 4460 cM = 6.73 X 10° bp / cM 
3000 X 10° bp / 2590 cM = 1.16 X 10° bp / cM 
3000 X 10° bp / 4460 cM = 8.51 X 10° bp / cM 


There will always be a balance between increasing the size of the mapping 
population and thereby having a more accurate map position and identify- 
ing the physical DNA spanning flanking mapped markers. In organisms 
with a large number of base pairs per cM, it is often worthwhile to increase 
the number of individuals in a mapping population and thereby decrease 
the number of base pairs of DNA potentially encoding the locus of interest. 


Homo female: 
Homo male: 


Homo average: 


6. While PCR or northern blotting approaches can give some per- 
spective on expression patterns, observing them in situ provides more 
information. For this, either a transcriptional or translational fusion 
to a reporter gene (e.g., lacZ or GFP) would be best. The difficulty 
may be in initially identifying the sequences responsible for proper 
expression of the gene. These experiments are judged by the following 
standards: (1) How well does the observed expression pattern of the 
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marker line match with all other data on expression patterns? and 
(2) Can a translational fusion gene complement a loss-of-function 
mutant phenotype? 


8a. In yeast, I would create loss-of-function alleles by homologous 
recombination gene replacement. 


8b. I would create a translational fusion with a reporter gene (e.g., GFP), 
preferably with all endogenous regulatory sequences. 


10. If the transposon supplies additional regulatory elements, insertion 
of the transposon adjacent to a gene may result in ectopic or overex- 
pression of the adjacent gene, resulting in a dominant gain-of-function 
allele. Alternatively, if the transposon is inserted into the coding region 
of a gene, it will result in a loss-of-function allele. 


12. Since meiosis is not required for viability, a genetic screen 
searching for mutants that fail to undergo meiosis properly would 
work. However, the ability to cross the mutant for complementa- 
tion tests would be useful, and thus a screen for conditional mutants 
would be desirable. Since chemical mutagenesis induces the broad- 
est spectrum of alleles, it would be a better choice of mutagen than 
insertion of deletion alleles, which are often null. Finally, the simplest 
genetic system in which meiosis occurs would be the best system to 
examine this question. S. cereviseae, where many genetic tools are 
available, would be a good choice. 
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14. Because you have no a priori information on the nature of the gene 
product, homology-based techniques are not applicable. Positional 
cloning would work and requires only a mutant phenotype to go 

from map position to gene. Since the genome of Drosophila has been 
sequenced, one could take a sequence-based approach as outlind in 
Figure 16.12. Transposon tagging would work by starting with an or- 
ganism heterozygous for a mutation in one of the genes and mobilizing 
the P element. Since identifying the mutants is the most time consum- 
ing, the sequence-based approach is the better choice. 


16a. The smaller 4.5-kb cDNA could be sequenced using a primer 
walking technique. For the 250-kb BAC clone, fragmentation and shot- 
gun sequencing would be a good approach. 


16b. I would focus on sequencing cDNA clones from the patients in 
order to identify both exonic and intron—exon boundary mutations. But 
this approach would not identify mutations in regulatory elements that 
are in non-transcribed regions. 


18a. Since Arabidopsis is a flowering plant, the female gametophyte 
(egg) is retained on the female parent, on the placenta. Female gameto- 
genesis can be directly observed within the ovules. Mutations resulting 
in female gametophytic mutations (e.g., lethality) can be observed as a 
1:1 ratio. 


18b. Since the male gametophyte (pollen) is produced in excess and is 
not retained on the plant, to observe male gametophytic mutations, I 
would observe the developing pollen directly. 


20a. Screen for mutants in which the pupae either eclose at a time 
other than dawn or eclose at random times during the day/night. While 
this phenotype might be detrimental in nature, in the laboratory it is 
likely to be completely viable. 

20b. Screen for mutations in which the expression of genes encoding 
photosynthetic machinery is no longer synchronized with the circadian 
rhythm. Again, while this phenotype in nature would be detrimental, in 
the laboratory it is likely to be viable, though a change in the color of the 
plants (e.g., lighter green) might be observed. 

20c. Positional cloning or since the genomes of these organsims have 
been sequenced, a sequence-based method (i.e., Fig. 16.12). 


Chapter 17 


2a. Sau3A, 1.17 X 10’; BamHI, 7.32 X 10°; EcoRI, 7.32 X 10°; Nott, 
4.58 X 10* 


2b. Sau3A, 1.08 X 10’; BamHI, 4.32 X 10°; EcoRI, 9.72 X 10°; Not, 
7.68 X 10° 


4a. The genomic libraries 
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4b. The two genomic libraries should completely overlap. The cDNA 
libraries should be a subset of the genomic libraries. The two cDNA 
libraries should only partially overlap. 


6. A 16 base-pair (bp) sequence is predicted to occur randomly once in 
4.3 X 10° bp; thus, oligonucleotides should be of at least this length to have 
a reasonable probability of being unique in the genome. 


8. The principles are identical for both species, but the techniques 
differ because homologous recombination occurs frequently in yeast 
and rarely in mice. Thus, positive—negative selection techniques are 
required in mice, while only positive selection is required in yeast. 


10. Gene therapy often targets blood diseases because blood circulates 
throughout the body. Thus, replacement of mutant bone marrow cells 
with corrected ones allows the defect to be corrected throughout the 
body. 


12. Both methods use “naturally” occurring biological entities. In 
plants, the Ti-plasmid is reengineered to have the gene of interest and 
then is reintroduced into Agrobacterium, which naturally transfers 

the T-DNA into the genome at random locations of plant cells. In 
Drosophila, the P element is reengineered to have the gene of interest 
and then injected into embryos, where it integrates into the genome at 
random locations. 


14. Most recombinant DNA manipulations involving combining of 
DNA fragments of less than 10 kb, including changing specific base 
pairs in a known sequence, can be accomplished by synthesis. However, 
for instances where the exact sequence of the DNA in question is not 
known, standard recombinant DNA techniques will continue to be 
needed. 


16. The sticky ends can be religated since the single strand overhangs 
can anneal, but neither enzyme can cut the resulting sequence following 
ligation. 


18. Based on the restriction enzyme digests, the following map can be 
drawn: 


08 
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20. Based on the restriction enzyme digests, the following map can be 
drawn: 
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A Xhol + HindIII double digest would enable the two small HindIII frag- 
ments to be ordered with respect the remainder of the restriction sites. 


22a. Loss-of-function alleles in S. cerevisiae can be produced by 
homologous recombination (see also Figure 17.5). Gain-of-function al- 
leles, such as those that produce the gene product constitutively, can be 
constructed by making a gene fusion combining a promoter that drives 
transcription constitutively in S. cerevisiae with the coding region of the 
gene of interest (see also Figures 17.14 and 17.17). 


22b. Since homologous recombination is not routine in tomato, loss- 
of-function alleles can be created by using RNAi-mediated mechanisms. 
In this case, a promoter that drives transcription constitutively can be 
transcriptionally fused with a sequence containing an inverted repeat, 
such that the mRNA produced can form a stem-loop including double- 
stranded RNA (see also Figure 17.18). The chimeric gene can be intro- 
duced into tomato using Agrobacterium (see also Figures 17.6 and 17.7). 
Gain-of-function alleles would be produced in a similar manner as 
described for S. cerevisiae, except the regulatory sequences need 

to be suited to tomato and the transgenic organisms produced by 
Argobacterium-mediated transformation. 


24. There are two possible approaches: (1) Mutagenize the bacterial 
strain and screen for mutants that can no longer metabolize crude oil. 
Then clone the corresponding gene(s) using a complementation assay, 
as outlined in Chapter 16. (2) Alternatively, clone the gene(s) by trans- 
ferring large genomic clones from the strain of interest into a related 
Pseudomonas strain that cannot metabolize crude oil. This approach 
often works in bacteria with specialized traits since the gene conferring 
the trait is often found within a single operon. 


26a. There are two possible approaches: (1) perform in situ hybridiza- 
tion using probes made from each of the two genes or (2) construct 


translational or transcriptional fusion genes with a reporter gene (LacZ, 
GEP) via homologous recombination methods. Translational fusions 
may retain their functionality as long as the marker gene fusion does 
not disrupt function of the protein of interest. If in a gene replacement, 
transcriptional fusions would result in a loss-of-function allele, but if in 
a recessive gene, transcription fusions would still provide information 
about gene expression in a phenotypically wild-type mouse. 


26b. Use homologous recombination techniques to replace the coding 
region of the gene with a selectable marker. Alternatively, you can use 
an RNAi-based approach to create a loss-of-function phenotype. The 
former approach has the advantage of heritability. 


26c. Create loss-of-function alleles via homologous recombination for 
each of the genes and examine the mutant phenotypes. Cross the two 
single mutants to create an F; population, interbreed the Fjs to produce 
an F,, and identify a double-mutant strain and examine its phenotype. 
If the double-mutant strain exhibits phenotypic defects beyond what is 
expected by the addition of the single-mutant phenotypes, then the two 
genes have redundant functions. Alternatively, you can use an RNAi- 
based approach to create a loss-of-function phenotype, but again, this 
approach is not heritable. 


28. Because the original sequence (highlighted) was from reverse 
translation of the protein sequence, in nucleotide positions where there 
is degeneracy in potential sequences, these may differ from the actual 
sequence encoded in the genome. 


30. The mutant sequence can be created using site-directed mutagene- 
sis. These clones could be used to create wild-type and mutant protein to 
be studied in vitro. To study in vivo consequences, it would be best to in- 
troduce the mutant version of the gene into its endogenous chromosomal 
location. Unlike the creation of loss-of-function alleles, where a selectable 
marker replaces the endogenous gene, the creation of gain-of-function 
mutations is slightly more complicated. One solution is to create the 
point mutation in the genome via homologous recombination and then 
remove the selectable marker using a Cre-lox-based system. However, 
care must be taken not to leave a “footprint” of non-endogenous 
sequences in any coding or regulatory sequences. 


Chapter 18 


2a. Repetitive DNA can often be assembled in many different ways, 
making unambiguous assembly difficult. On a finer scale, repetitive 
DNA can also lead to polymerase slippage causing sequence errors. 


2b. Dispersed, repetitive DNA that is longer than a single sequenc- 
ing read and is found at many locations in the genome is particularly 
problematic. 


2c. Paired-end sequencing is one approach to identify unique sequences 
flanking repetitive DNA. 


4. cDNA sequences provide information on which genomic sequences 
are transcribed and processed into mature mRNAs. Different forms of 
full-length cDNAs from the same region of genomic DNA can indicate 
alternative splicing. 


6. In eukaryotic genomes, one must account for the possible presence 
of introns; in prokaryotic genomes, open reading frames should be con- 
tiguous. Predictive algorithms must also take into account differences in 
promoter and enhancer elements/consensus sequences. 


8. Bioinformatic Method: Use an algorithm to search for potential 
open-reading frames within the sequence. This method is only predic- 
tive and not very accurate, so experimental data are needed to confirm 
accuracy. 


Comparative Method: BLAST the sequence against the database 
of known sequences. If sequences are conserved, they are likely to be 
functional. This method also needs experimental verification. 


Experimental Method: Use the sequence as a probe against a cDNA 
library or other technique (e.g., microarray, rtPCR) to determine which 
sequences are transcribed. This is the best method, but it is also much 
more time- and labor-intensive than the others. 
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10. Human proteins are closer to fungal proteins than to plant 
proteins; plant proteins are equidistant from either human or fungal 
proteins. 


12. Expression microarrays represent all the sequences of a genome 
that are transcribed, whereas tiling arrays represent all of the sequences 
in a genome. Expression or tiling microarrays can be used to obtain 
information on which sequences are transcribed in a specific tissue or 
cell type; tiling arrays can be used to identify binding sites of DNA bind- 
ing proteins and to assay recombination events between polymorphic 
strains. For most applications, high-throughput sequencing can provide 
the same type of information that arrays can. The former provides more 
quantitative information and information on alternative splicing than 
the latter. At present, the cost for high-throughput sequencing is higher 
than for arrays, but this may change in the future. 


14. This DNA sequence is the synthetic version (the nucleotide 
sequence inferred from reverse translation of the protein sequence) 
of the human insulin gene. 


16a. All three genes are orthologs. 


16b. AY1 and AY2 are paralogs; AY/ and BY are orthologs; BY and CY are 
orthologs. 


16c. AZ1 and AZ2 are paralogs; BZ1 and BZ2 are paralogs; BZ2 and 
BZ3 are paralogs; BZ1 and BZ3 are paralogs; CZ1 and CZ2 are paralogs; 
AZI and AZ2 are orthologous to BZ1, BZ2, and BZ3; AZ1 and AZ2 are 
orthologous to CZ1 and CZ2; CZ1 is orthologous to BZ1 and BZ2; CZ2 
is orthologous to BZ3. 
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18. While large-scale chromosomal rearrangements appear to have 
been rare in primate evolution during mammalian evolution, small- 
scale rearrangements appear to be common and frequent. 


20. Segmental duplications, resulting in large-scale gene duplication, can 
often lead to genetic redundancy, especially if the duplication is evolu- 
tionarily recent. Using reverse genetics (see Chapter 17), loss-of-function 
alleles can be created in the duplicate genes. Due to potential genetic 
redundancy, double mutants of the two paralogs might have to be con- 
structed to observe an aberrant mutant phenotype. 


22a. Any of these approaches are possible: (1) create a loss-of-function 
allele by gene replacement and examine for mutant phenotype, 

(2) create a gain-of-function allele by constitutive expression of the 
gene and examine for mutant phenotype, (3) create a reporter gene 
fusion allele by gene replacement to examine where and when in the 
cell the protein is expressed, (4) perform a synthetic enhancer screen 
in the loss-of-function background, (5) perform a two-hybrid screen 

to identify interacting proteins, (6) perform transcriptome analysis to 
examine the expression pattern of the gene. 


22b. In a human genome, the possibilities are much more limited: only 
(5) and (6) from the answer to question 22a would work. 


24. The first step is to organize the data to identify genes that behave 
similarly and those that behave differently. For example a, c, d, e f, g i, 

j kn, q, andr all increase in expression with both high salt and high 
temperature; b, p, and s all decrease in expression with both high salt 
and high temperature; / and o decrease in response to salt but increase 
in response to high temperature; / and m increase in response to salt 

but decrease in response to high temperature. This analysis provides 
information into possible roles of genes that may be involved in a general 
stress response versus genes that may have specific roles in response to 
salt or temperature stress. 


28. The PEG10 gene is likely derived form the insertion of a retrotrans- 
poson, and its protein-coding sequences have been co-opted to perform 
a role in placenta formation. Retrotransposons contain a gene encoding 
reverse transcriptase, a nucleic-acid-binding protein that could be 
co-opted to have a role in binding and regulating endogenous nucleic 
acid sequences. The presence of the gene in placental mammals only 
suggests that the insertion of the retrotransposon occurred in the 
common ancestor of therian mammals, after the divergence of the 
monotremes from the rest of the mammals. 
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Chapter 19 


2. Their membrane system, chromosomal organization, replication, 
transcription, and translation (ribosome structure) are all similar to 
those in bacteria. 


4. Sequencing of eukaryotic genomes has revealed evidence of transfers 
that are recent and transfers that are ancient. Transferred sequences 
that are highly similar must have been transferred recently. 


6. See Table 19.1 for variations in the genetic code in mitochondria. A 
consequence of tRNA gene number reductions and the change in the 
code minimizes errors such that the two closely related codons, UGA 
and UGG, are both Trp. 


8. There are three steps in this process: (1) Transfer of organelle DNA 
from organelle to nucleus and integration into the nuclear genome. 

(2) Acquisition of nuclear transcriptional regulatory elements such that 
the organellar gene is transcribed in the nucleus. Translational regula- 
tory elements are similar (except where there are changes in the genetic 
code or RNA editing, both of which might inhibit production of a func- 
tional protein). (3) For the protein to be targeted back to the organelle, 
protein sequences facilitating efficient subcellular targeting need to be 
acquired from adjacent genomic sequences. 


10. Transcription and/or translational and/or posttranslational 
regulation of the multiple components of the complexes need to be 
coordinated such that appropriate stoichiometries of the subunits are 
produced. 


12. The most appropriate advice would be the following: for III-1, none 
of your progeny will be afflicted; for III-2, all of your progeny will be 
afflicted, and the extent may vary between individuals depending on 
levels of homo- and heteroplasy; for III-3, all of your progeny will be af- 
flicted, though the extent may vary between individuals depending on 
levels of homo- and heteroplasy. 


14, Maternal inheritance 


16. pet1 is a segregational (nuclear) mutation; pet2 is a neutral muta- 
tion. The expected progeny is 2 wild type : 2 petite. 


18. Since inheritance of this syndrome is maternal and not paternal, 
there is no need to worry, but if the mother exhibits symptoms, then 
there is a probability that their children would be affected. The extent to 
which their children will 

be affected depends on whether the mother is homoplasmic (all off- 
spring would be affected) or heteroplasmic (possibility that some might 
not be affected). 


20. Sibling II-2’s children will be afflicted, but II-5’s children will not be 
afflicted because MERRF is maternally, not paternally, inherited. 


22. Maternal inheritance is the most likely, but it is penetrant only in 
males. 


24. mtDNA is found in many copies per cell, and the probability of 
being preserved is higher than it is for nuclear DNA. It is also highly 
useful for elucidating evolutionary relationships. 


26. If there was no interbreeding, all coyote sequences would be more 
closely related to each other than to wolf sequences; this is not the case, 
so there must have been interbreeding. 


28. Examine the genome of the sea slug and search for genes that are 
closely related to nuclear genes of the algae. If they are present in the 
nuclear genome of the sea slug, horizontal gene transfer can be as- 
sumed to have occurred. Use sequencing to confirm that these genes 
are not present in relatives of the sea slug. 


Chapter 20 


2. The neural crest cells differentiate autonomously with the identity 
of the species from which they are derived. However, they can recruit 
host cells to contribute to the beak, suggesting that the neural crest cells 
non-autonomously influence the developmental fate of neighboring 
host cells. 


4a. Ina syncytium proteins are free to diffuse, so mechanisms whereby 
factors are restricted by membranes are not functional. 


4b. Gradients of morphogens such as bicoid and nanos either must not 
exist or the gradients must be established in another manner, such as 
cell-cell communication. 


6. Segments correspond to the clear morphological and anatomical 
divisions in the larva or adult organism. Parasegments are offset from 
the segments, spanning the posterior part of one segment and the ante- 
rior part of its neighbor, and correspond to domains of gene expression. 
Developmental biologists consider parasegments as the subdivisions 
that are produced during fly development because they correspond 

to the domains of gene expression that control pattern formation and 
identity in the organism. 


8. Similarities: Both segmentation in Drosophila and floral or- 

gan whorls in Arabidopsis are serially repeated structures/segments 
with identities controlled by related sets of transcription factors that 
act combinatorially and exhibit cross-regulation. Differences: In 
Drosophila the genes are Hox genes and are encoded in complexes in 
the genome, whereas in Arabidopsis the genes are MADS-box genes 
that are dispersed throughout the genome. 


10a. Ina loss-of-function mutant, the phenotype is vulva-less. 
10b. In a gain-of-function mutant, the phenotype is multi-vulval. 


12a. This phenomenon can occur only when cells are totipotent. Once 
cells are only pluripotent, the identities of possible differentiation path- 
ways are limited and may not be able to form a complete organism. 


12b. The resulting individual will be a genetic mosaic, consisting of 
two distinct genotypes. This probably happens more than is acknowl- 
edged, but it is detected only when multiple parts of an individual are 
genotyped. 


14. Extra copies of Bicoid would increase the amount of bicoid mRNA 
that the mother puts into her eggs, thus increasing the amount of 
bicoid protein. This would result in a posterior shift in threshold levels 
of bicoid required to activate downstream targets; hunchback expres- 
sion would be increased; other gap genes and pair-rule genes are also 
likely to be affected, with a general shift of anterior gene expression 
patterns (and subsequent fates) to more posterior positions—shifting 
the other gap gene expression patterns to more posterior positions. 


16a. Pair-rule genes might be expected to influence the expression of 
the segment polarity genes, which act at a later time in development. 


16b. The fushi tarazu single mutant likely has a loss of the even- 
numbered parasegments ( fushi tarazu is Japanese, meaning “too few 
segments”), and the engrailed single mutant likely has defects in the 
anterior part of each parasegment. Thus, one might predict that the 
double mutant would be a combination of these two single-mutant 
phenotypes. 


18. This pattern could be established by lateral inhibition. 


20a. Gain-of-function alleles in /et-23 and let-60 would result in a vulva 
being produced in the /in-3 loss-of-function background. 


20b. Loss-of-function alleles of /et-60 would suppress the gain-of- 
function /et-23 multi-vulva phenotype and result in a vulva-less worm. 


22a. The consequence of ectopically expressing Hoxd10 throughout 
the developing mouse limb bud would be that the “thumb” would ac- 
quire an “index finger” identity; for Hoxd11, the “index finger” would 
acquire a “middle finger” identity, and the “thumb” would have an 
altered identity promoted by Hoxd9 + Hoxd11 (but it is not clear what 
it might look like, since that combination is not found in any wild-type 
digit). For Hoxd10 and Hoxd11, both the “thumb” and “index finger” 
would acquire a “middle finger” identity. 


22b. To construct this model, you need to create a conditional allele. One 
approach would be to introduce lox sites flanking the Hoxd9-13 cluster 
of genes via homologous recombination (see Chapter 17). The interven- 
ing DNA including the genes could then be excised by induction of the 
Cre recombinase protein, which could be controlled either by regulatory 
elements driving expression in the limb bud or perhaps by a heat shock. 
24. tra-1 and tra-2 mutant alleles are epistatic to the her-1 mutant al- 
lele, while the putative gain-of-function tra-1 allele (her-2) is epistatic to 
recessive loss-of-function alleles of tra-1 and tra-2. Since the wild-type 


allele of tra-1 acts as a repressor of male development and the wild-type 
allele of her-1 acts as a repressor of hermaphrodite development, the ac- 
tivities of the genes in a wild-type animal can be summarized as follows: 


genotype her-1 tra-2 tra-1 phenotype 
XX off on on hermaphrodite 
XO on off off male 


Based on the observation that tra-1 is epistatic to her-1, tra-1 should 
be downstream of her-1. In addition, assuming that her-2 is a gain-of- 
function allele of tra-1 and is epistatic to tra-2, this places tra-1 down- 
stream of tra-2. Thus, a model for the control of sex determination can 
be constructed. 


In one model, the X:A ratio influences the activity of a repressor, her-1, 


which represses tra-2, which in turn activates tra-1, which then pro- 
motes hermaphrodite (female) fate and represses male development: 


X:A ratio ---] her-1 ~] tra-2 = tra-1 or = 

high (XX/AA) low high high 5 hermaphrodite 
development 

low (XO/AA) high low low =o male 


development 


26a. In an otherwise wild-type background, the phenotypic effects of 
agamous mutations are confined to the third and fourth whorls, but in 
an apetala2 mutant background, phenotypic effects of agamous muta- 
tions are also seen in the first and second whorls. This implies that in 
an apetala2 mutant background, AGAMOUS is ectopically expressed in 
the first and second whorls, and the converse is true for the phenotypic 
effects of apetela2 mutations. 


26b. Yes; cross-regulatory interactions occur among Hox genes in ani- 
mals. Posteriorly expressed Hox genes often repress the expression of 
Hox genes normally expressed in respective anterior positions. This is a 
common, though not universal, feature in the regulation of Hox genes. 


28. Based on the phylogeny of eukaryotes (see Figure 18.11), the last 
common ancestor of Basidiomycota and animals (or plants) was likely 
a single-celled organism. Thus, as in the comparison of multicellular 
development of plants and animals, Basidiomycota are likely to utilize a 
unique set of genes to direct their development. While the genes might 
be expected to encode transcription factors and signaling molecules, 
they are not likely to be homologous to those directing development in 
plants and animals. Thus, a forward genetic screen to identify pattern 
formation mutants in mushrooms would likely be a more successful 
approach than any reverse genetic screens. 


Chapter 21 


2. Traits la through 1d are likely to be multifactorial. Dietary nutrition 
and temperature are two environmental conditions likely to influence 
each trait. 


4. Vg =2.25, Vg =5.40 — 2.25 = 3.15. 

6. The mean is 165.75, s? = 1137.22/11 = 103.38, and s = 10.17. 
8a. A,A,B,B,C)C; = 36 cm and A2A 2B2B2C2C2 = 18 cm. 

8b. 27 cm 

8c. 24cm 


8d. Any genotype with five “1” alleles and one “2” allele yields 
[(5)(6 cm)] + (3 cm) =33 cm. 


8e. There are (3)? =27 possible genotypes. 


8f. Seven different phenotypes are possible. 
10a. Ve=3.5 and Vg =7.4 -3.5 =3.9. 

10b. H? =3.9/7.4 =0.527. 

12. H? =34.48/38.10 = 0.905. 


14a. For the cross involving 12-gram tomatoes, S = —4 g; for the cross 
involving 24-gram tomatoes, S =8 g. 
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14b. For the cross involving 12-gram tomatoes, R = (—4)(0.8) =—3.2 g; 
for the cross involving 24-gram tomatoes, R = (8)(0.8) = 6.4 g. 


16. Blood type is known (see Chapter 4) to be the result of three al- 

leles of a single gene, and the MZ results confirm the exclusive genetic 
determination of blood type. Chicken pox is an infectious disease, and 
there is no reason to suspect gene-dependent differences in infection as 
the equal concordance values indicate. For the five other conditions, MZ 
concordance is considerably higher than DZ concordance, suggesting 
that genes have a pronounced effect on the appearance of the conditions. 


18. QTL analysis screens a large number of SNP and other DNA 
sequence variants and plots the results on the phenotype of interest, 
running speed in this case. Any DNA markers associated with faster 
running speed could potentially indicate the nearby location of a gene 
(quantitative trait locus) influencing running speed. 


22a. For protein content, S = 22.7 — 20.2 = 2.5%; for butterfat content, 
S=74—-6.5 =0.9%. 

22b. Response to selection will be greater for protein content 
[(2.5)(0.60) = 1.5%] than for butterfat content [(0.9)(0.80) = 0.72%]. 


24a. 100%. Blood type is controlled by genotype alone, and MZ twins 
are genetically identical. 


24b. 50%. Blood types A (Fi) and B (Fi) are both expected, with a 
probability of 1/2. The chance that DZ twins will have the same blood 
group is 1/4 for blood type A plus 1/4 for blood type B, or 1/2. It is also 
possible that since DZ twins are the result of independent fertilization 
events, one twin could be blood type A and the other blood type B. 
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Chapter 22 


2. Inbreeding is a genome-wide phenomenon that increases homozy- 
gosity and reduces heterozygosity. It does not change allele frequencies; 
rather, it nonrandomly distributes alleles into genotypes. Inbreeding can 
increase the probability that inbred organisms might be homozygous 
for a rare recessive allele. 


4. Natural selection conferring the highest relative fitness on particular 
heterozygous organisms will eliminate the two alleles when they occur 
in homozygous genotypes. Equilibrium allele frequencies are estab- 
lished as a ratio of the selection coefficients operating against the alleles. 


6. Among a very small number of population samples, each outcome 
contributes a large percentage to the total; in much larger samples, each 
individual outcome contributes a much smaller percentage to the total. 


8. A genetic bottleneck substantially reduces the number of organisms 
in a population. Elimination and survival are random. Some alleles may 
be lost, and others may survive at much different frequencies than were 
present before the bottleneck. The overall result is less genetic diversity 
after the bottleneck. As populations increase in size after a bottleneck, 
they display a reduced level of genetic variability in comparison to di- 
versity before the bottleneck. 


10a. V0.28 = 0.53. 
10b. 1—0.53 =0.47. 


10c. Using p = 0.47 for the dominant allele T and q = 0.53 for the 
recessive allele t, TT = (0.47)? =0.22, Tt = 2(0.47)(0.53) = 0.50, and 

tt = (0.53)? = 0.28. 

12. Mutation can generate new antibiotic-resistant alleles that confer 
improved survival on bacteria carrying the mutant alleles that are ex- 
posed to an antibiotic. 


14. Evolutionary processes, including directional selection, require 
inherited variability for their operation. In the absence of inherited 
variation, there is just a single allele of a gene, and selection has no 

alternative alleles on which to exert selective pressure. 


16. Inbreeding may be reduced by carefully managing matings to en- 
sure that the level of relationship is minimized to the extent possible 
when matings take place. 


18. The expected frequencies of rabbits are black (C;C;) = (0.70)? =0.49, 
tan (C1C2) = 2(0.70)(0.30) = 0.42, and white (C2C3) = (0.30)? = 0.09. 


A-20 APPENDIX: ANSWERS 


20. In this problem, s = 1 — 0.82 = 0.18, and t= 1 — 0.32 = 0.68. The 
estimated equilibrium frequencies are 4 = t/s + t = 0.68/0.86 = 0.791 
and f° = s/s + t=0.18/0.86 = 0.209. 


22a. A genetic bottleneck is a substantial population reduction that 
eliminates population members at random. 


22b. Since the loss of population members is random in a genetic bot- 
tleneck, the overall level of genetic diversity is reduced; certain alleles 
are eliminated and other alleles retained at frequencies that are higher 
or lower than before the bottleneck. In Ashkenazi populations, repeated 
bottlenecks followed by population growth generated a founder effect 
that brought alleles such as the recessive for Tay-Sachs disease to high 
frequency. 

22c. In this population, the recessive allele frequency, flt), is 

V0.00133 = 0.0365. 

22d. The dominant allele frequency is f(T) = 1 — 0.0365 = 0.9635, 

and the carrier frequency is 2(0.9635)(0.0365) = 0.070, or about 7 per 
1000 people. 

26a. Following one generation of natural selection, the relative genotype 
frequencies are C;C; = 0.421, C7C)=0.526, and CC = 0.032. The ap- 
proximate allele frequencies are C; = 0.421 + (0.5)(0.526) = 0.684, and 
C= 0.053 + (0.5)(0.526) = 0.316. 

26b. Following reproduction of the survivors of predation, the geno- 
type frequencies are CC, = (0.684)? = 0.468, CC = 2(0.684) (0.316) = 
0.432, and C2C2 = (0.316) = 0.010. 

26c. The equilibrium allele frequencies are predicted to be 

Cı = 0.2/0.6 + 0.2 = 0.25, and Cp =0.6/0.6 + 0.2 =0.75. 


28. Assuming f(4) =p, fU”) =4, and f(i) =r, the frequency of 


336 ; 
r=] 1000 0.579, the frequency of p is V0.421 + 0.336 — 0.579 
= 0.290, and the frequency of q is 1 — (0.579 + 0.290) = 0.131. 


30a. For dimpling, the genotype frequencies are estimated to be 

DD = (0.62)? = 0.3844, Dd = 2(0.62)(0.38) = 0.4712, and dd = (0.38)? = 
0.1444. For PTC tasting ability, the genotype frequencies are estimated 
to be TT = (0.76)? = 0.5776, Tt = 2(0.76)(0.24) = 0.3648, and tt = (0.24)? 
= 0.0576. 


30b. The expected phenotypes are dimpled taster (D_T_) = (0.8556) 
(0.9424) = 0.8063; dimpled nontaster (D_tt) = (0.8556)(0.0576) = 0.0493; 
undimpled taster (ddT_) = (0.1444) (0.9424) = 0.1361; undimpled non- 
taster (ddtt) = (0.1444)(0.0576) = 0.0083. 


32a. For D3S1358, the frequency of 16/18 heterozygotes is estimated to 
be 2(0.229)(9.162) = 0.0742. For VWA, the estimated frequency of 14/18 
heterozygotes is 2(0.131)(0.189) = 0.0495. For FGA, the estimated fre- 
quency of 23/26 heterozygotes is 2(0.131)(0.018) = 0.0047. 


32b. Each heterozygous genotype of a CODIS gene is estimated by an 
expression similar to that used to determine the heterozygous class for 
a gene with two alleles (i.e., 2pq). Homozygous genotypes are estimated 
as the square of the allele frequency (i.e., p°). The joint probability for 
multiple genotype of independently assorting CODIS genes is estimated 
using the product rule. 


34a. Individuals V-1, V-2, and V-3 are inbred. 
34b. Common ancestors are I-1 and I-2. 
34c. F=4(1/2) = 0.015625. 


38. The recessive allele producing achromatopsia was present in the 
original (pre-typhoon) Pingelapese population, though the original al- 
lele frequency is unknown. The typhoon produced a genetic bottleneck 
that produced a frequency of approximately 1 copy in 40 alleles, or 

q = 0.025 for the recessive allele. Subsequent repopulation of the island 
was affected by genetic drift and inbreeding that may have acted to 
increase the frequency of the allele (genetic drift) and to increase the 
likelihood of individuals who are homozygous IBD (inbreeding). 


Glossary 


2-micron plasmid A naturally occurring Sac- 
charomyces cerevisae plasmid (circumference 
= 20 um) that has been engineered to work as 
a vector in yeast. 

3’ polyadenylation (3’ poly-A tailing) Dur- 
ing eukaryotic pre-mRNA processing, an 
enzyme-driven modification that removes the 
3’ end of the pre-mRNA and adds numerous 
adenines. 


3’ splice site In eukaryotic pre-mRNA pro- 
cessing, the location of cleavage at the 3’ end 
of an intron. Contains an AG dinucleotide in a 
consensus sequence. 


3’ to 5’ exonuclease activity DNA- and 
RNA-digesting activity that progresses in the 
3’ to 5' direction to remove nucleotides. See 
also DNA proofreading. 


3’ untranslated region (3’ UTR) The un- 
translated segment of mRNA between the stop 
codon and the 3’ end of the transcript. 


5’ capping In eukaryotic pre-mRNA process- 
ing, the addition of 7-methylguanosine to the 
nucleotide at the 5’ end of pre-mRNA bya 
triphosphate bridge. Methylation of adjacent 
nucleotides may also occur. 


5’ splice site In mRNA processing, the loca- 
tion of cleavage at the 5’ end of an intron. 
Contains a GU dinucleotide in a consensus 
sequence. 


5’ to 3’ exonuclease activity DNA- or RNA- 
digesting activity that progresses in the 5’ to 3’ 
direction to remove nucleotides. 


5’ to 3’ polymerase activity DNA synthesizing 
activity of DNA polymerases that progresses 

in the 5’ to 3’ direction to add new nucleotides 
to a growing DNA strand. Requires a template 
strand. 


5‘ untranslated region (5' UTR) The untrans- 
lated segment of mRNA between the 5’ end of 
the transcript and the start codon. 


6-4 photoproduct A DNA lesion and poten- 
tial mutagenic event caused by exposure to 
ultraviolet (UV) irradiation. 


—10 consensus sequence See Pribnow box. 


10-nm fiber The “beads-on-a-string” form of 
chromatin, in which DNA is wrapped around 
nucleosomes. 


30-nm fiber A structure of chromatin in 
which histone 1 (H1) partially condenses chro- 
matin fibers into a coiled form. Also known as 
solenoid or solenoid structure. 


30S initiation complex In bacterial transla- 
tion, the complex formed by a small riboso- 
mal subunit, mRNA, and the tRNA carrying 
fMet. 


—35 consensus sequence A specific con- 
sensus sequence of the bacterial promoter at 
which RNA polymerase is bound. 


70S initiation complex The fully assembled 
bacterial ribosome that is prepared to initiate 
translation. 


300-nm fiber A structural state of chromatin 
in which chromatin fibers are looped and 
condensed. 


a-globin gene and protein A gene belong- 
ing to a family of closely related genes that 
encode a globin polypeptide that is part of 
hemoglobin. 


a-helix (alpha helix) A form of secondary 
protein structure in which segments of proteins 
form helical structures that are held together 
by hydrogen bonds. 


a-proteobacteria Lineage of bacteria that are 
the closest extant relatives of the lineage that 
gave rise to mitochondria. 


BA allele The common (wild-type) allele of 
the human B-globin gene. 


B-globin gene and protein A gene belong- 
ing to a family of closely related genes that 
encode a globin polypeptide that is part of 
hemoglobin. 


B-pleated sheet (beta-pleated sheet) A 
form of secondary protein structure in which 
segments of proteins form n parallel arrays 
that are held together by hydrogen bonds. 


Bi allele A specific mutant allele of the 
human ß-globin gene that produces sickle cell 
disease in homozygous individuals. 


O (theta) structure In bacterial DNA replica- 
tion, the name given to an intermediate struc- 
ture of DNA replication of a circular molecule 
with a single origin of bidirectional replication. 


@ (theta) value A variable indicating a re- 
combination distance between genes. Used in 
lod score analysis. 


aberrant ratio In fungi, a ratio of haploid 
spore genotypes within a single ascus indicat- 
ing gene conversion. 


acentric fragment (acentric chromosome) A 
chromosome fragment without a centromere. 


acrocentric chromosome A eukaryotic chro- 
mosome in which the centromere is very near 
one end. Forms a chromosome with long and 
short arms of distinctly different lengths. 


activator (Ac) element In transposition, a 
transposable genetic element containing a 
transposase gene. 


activator binding site DNA sequence to 
which an activator protein binds to regulate 


gene expression. Term refers to regulatory sites 
in bacteria; in eukaryotes, the equivalent se- 
quence would be called an enhancer element. 


activator protein A transcription factor 
that binds to regulatory sequences associ- 
ated with a gene and upregulates that gene’s 
expression. 


addition rule See sum rule. 


additive genes Genes contributing to a poly- 
genic trait and producing their effect by their 
cumulative contributions that are approximately 
equal for each gene. 


additive variance (V4) For quantitative traits, 
the component of genetic variance contributed 
by genes having an additive effect on pheno- 
typic variance. 


adenine (A) One of four nitrogenous nucleo- 
tide bases in DNA and RNA; one of the two 
types of purine nucleotides in DNA and RNA. 


adjacent-1 segregation A pattern of chro- 
mosome segregation that can occur following 
reciprocal balanced translocation. Leads to 
gametes carrying gene duplications and 
deletions. 


admixed population A population whose 
members are a blend of formerly distinct 
populations. 
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agarose An inert material derived from agar 
that is mixed with buffer and used to form gels 
for gel electrophoresis. 


allele An alternative form of a gene. 


allele-counting method A method for deter- 
mining allele frequency in a sample by tabulat- 
ing the number of alleles of each type. 


allelic phase Describing the cis and trans 
arrangements of alleles of linked genes on ho- 
mologous copies of a chromosome pair. 


allelic series A group of alleles of a gene that 
display a hierarchy of dominance relationship 
among them. 


allolactose A modified from of lactose that 
binds to the lac repressor protein, inducing an 
allosteric change that reduces the DNA binding 
ability of the complex. 


allopatric speciation The development of 
new species in geographic isolation. 
allopolyploidy A polyploidy organism arising 
through the union of chromosome sets from 
different species. 


allosteric domain Domain of a protein that 
allows the protein to change shape when it 
binds to a specific molecule; the protein in the 
new shape is altered in its ability to bind to a 
second molecule (e.g., DNA). Also known as 
allostery. 
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G-2 GLOSSARY 


allosteric effector compound Molecule that 
binds to the allosteric protein domain and 
subsequently induces a change in the bound 
protein. 


allostery Reversible interactions of a small 
molecule with a protein that lead to changes 
in the shape of the protein and to a change 
in the interaction of the protein with a third 
molecule. 


alternate segregation A pattern of chromo- 
some segregation that can occur following 
reciprocal balanced translocation that leads to 
the production of viable gametes. 


alternative pre-mRNA processing 
(alternative intron splicing, promoter, 
polyadenylation) In eukaryotic pre-mRNA 
processing, alternative processes by which 
different mRNAs can be produced from the 
same gene using different promoters or poly- 
adenylation sites or by removal of different 
exon elements. 


alternative sigma (o) subunit Different 
forms of the sigma subunit of bacterial RNA 
polymerase that induce distinct conforma- 
tional changes to the RNA polymerase core 
and to the recognition of distinct promoters. 


Ames test A laboratory method commonly 
used to determine whether a compound or 
one of its breakdown products is mutagenic. 


amino acid An aminocarboxylic acid that is a 
component of a polypeptide or protein. 


aminoacyl site (A site) The site on a ribo- 
some at which incoming charged tRNAs match 
their anticodon sequence with mRNA codons. 


aminoacyl-tRNA synthetase (tRNA 
synthetase) A group of enzymes whose spe- 
cific functions are to identify particular tRNAs 
and catalyze the attachment of the appropriate 
amino acid at the 3’ terminus. 


amorphic mutation See null mutation. 


anagenesis Phylogenetic evolution of a new 
species from an ancestral species without 
branching. 


anaphase The phase of mitosis during which 
sister chromatids separate (anaphase A) and 
move to opposite poles (anaphase B). 


aneuploid An uneven number of chromo- 
somes. Usually the result of the gain or loss ofa 
chromosome—that is, 27 + 1 (trisomy) or 27 — 1 
(monosomy). 


annotation (gene annotation, genome 
annotation) The process of attaching biologi- 
cal functions to DNA sequences. Genome anno- 
tation is the process of identifying the location 
of genes and other functional sequences within 
the genome sequence; gene annotation defines 
the biochemical, cellular, and biological function 
of each gene product the genome encodes. 


Antennapedia complex One of two homeotic 
gene clusters in Drosophila consisting of five 
genes (labial, Deformed, Sex combs reduced, 
proboscipedia, and Antennapedia) that act in 
combination to specify the cephalic and 
thoracic parasegments. 


antibiotic resistance An inherited trait of a 
microbe that permits it to grow in the pres- 
ence of a compound that kills or prevents the 
growth of antibiotic-sensitive microbes. 


anticodon The nucleotide triplet sequence of 
transfer RNA that pairs with an mRNA codon 
sequence in translation. 


antiparallel Opposite 5’ and 3’ orientations 
of two complementary nucleic acid strands. 


antisense RNA An RNA molecule that is com- 
plementary to a portion of a specific mRNA. 


antitermination stem loop (antiterminator) 
A stem loop that allows RNA polymerase to 
continue transcription through the leader 
region of bacterial attenuator controlled 
operons and into the structural genes of an 
operon (e.g., the 2-3 stem loop in trp operon 
regulation). 


apurinic (AP) site The location of a nucleo- 
tide that has lost its purine base. 


arabinose (ara) operon An inducible operon 
consisting of genes encoding enzymes allowing 
the use of arabinose as a carbon source. The 
operon is controlled by a single regulatory pro- 
tein, which carries out both positive and nega- 
tive transcriptional regulation. 


Archaea One of the three domains of life; 
separate from Bacteria and Eukarya. 


archaeal initiation factor (alF) The transcrip- 
tion initiation proteins found in archaeal cells. 


Argonaute Protein subunit of RISC (RNA- 
induced silencing complex) that binds small 
RNA molecules and provides either the 
catalytic “slicer” activity or the translational 
repressor activity. 


artificial cross-fertilization A controlled 
cross between plants made by an investigator 
who transfers pollen from one plant to fertilize 
the other plant. 


ascus The spore sac formed by fungi con- 
taining four (tetrad) or eight (octad) haploid 
spores. Also called spores. 


asexual polyploidization Chromosome du- 
plication that is the product of nondisjunction 
in mitotic cell division. 


aster The structure forming during cell divi- 
sion that contains microtubules emanating 
from centrosomes. 


attachment site (att site) Identical or nearly 
identical sequences on the bacterial and bacte- 
riophage chromosomes that are cut and used 
to integrate or to excise the bacteriophage 
chromosome from the bacterial chromosome. 


attenuation A gene regulatory mechanism 
that fine-tunes transcription to match the 
momentary requirements of the cell, achiev- 
ing a more or less steady state of compound 
availability. 

attenuator region A regulatory region 
downstream of the promoter of repressible 
amino acid operons that exerts transcriptional 
control (in the form of transcription termina- 
tion) based on the translation of a leader 


peptide, the efficiency of which is determined 
by the availability of specific amino acids. 


autopolyploidy A pattern of polyploidy 
produced by the duplication of chromosomes 
from a single genome. 

autoradiograph A photographic image 
obtained by exposure of X-ray film to the 
radioactive decay of isotopes attached to 
molecular probes. Used in the analysis of gel 
electrophoresis. 


autosomal dominant inheritance A pat- 
tern of hereditary transmission in which 
the dominant allele of an autosomal gene 
results in the appearance of the dominant 
phenotype. 


autosomal inheritance Hereditary transmis- 
sion of genes carried on autosomes. 


autosomal recessive inheritance A pattern 
of hereditary transmission in which the reces- 
sive allele of an autosomal gene results in the 
appearance of the recessive phenotype. 


auxotroph A microbe with one or more mu- 
tations that prevents its growth on a minimal 
medium. 


Bacteria One of the three domains of life; 
separate from Archaea and Eukarya. 


bacterial artificial chromosome (BAC) 
Cloning vector used in bacteria that utilizes 
the F plasmid origin of replication; can accept 
DNA inserts up to 500 kb. 


bacterial chromosome The main, usually 
singular, chromosome encoding the genome of 
a bacterium. 


bacteriophage (phage) A virus whose host 
is a bacterium. 


balanced polymorphism A genetic polymor- 
phism maintained in a population because or- 
ganisms with the heterozygous genotype have 
higher relative fitness than do organisms with 
either of the homozygous genotypes. 


balancer chromosome A chromosome with 
inversions used to maintain specific allele 
combinations (e.g., recessive lethal alleles) in 
genetic stocks. 


balancing selection The form of natural 
selection that operates in favor a heterozygous 
genotype and leads to stable equilibrium fre- 
quencies of the alleles in a population. 


band (in electrophoresis gel) A region in an 
electrophoresis gel or in an autoradiograph 
where a protein of nucleic acid congregates. 
Usually visualized using a stain or molecular 
probe. 


barcode Short DNA sequences that identify 
specific strains in knockout libraries. 


Barr body The darkly staining inactive X 
chromosome visible in mammalian female nu- 
clei. The result of random X inactivation. 


basal transcription The very low level of 
transcription characteristic of a bacterial 
promoter that requires an inducer to initiate 
transcription. 


base excision repair DNA repair that excises 
a damaged nucleotide base and then replaces 
the entire nucleotide. 


base-pair substitution mutation A DNA 
sequence change resulting in the substitution 
of one base pair for another. 


base stacking A phenomenon of DNA base- 
pair interaction that rotates the base pairs 
around a central axis of symmetry and imparts 
twisting to the double helix. 


basic local alignment search tool (BLAST) A 
computer program designed to search for ho- 
mologous sequences in databases. 


bidirectional DNA replication The standard 
method of DNA replication that synthesizes 
new DNA in both directions from a replica- 
tion origin. 

binomial probability A probability function 
using two coefficients, a and b, whose sum 
equals 1 and whose products predict the 
probability of events. 


bioinformatics The use of computational 
approaches to decipher DNA-sequence 
information. 


biparental inheritance Condition in organel- 
lar inheritance where both parental gametes 
make contributions of cytoplasmic organelles 
to the zygote; contributions are often unequal 
because one gamete contributes more of the 
cytoplasm and the other gamete makes a 
smaller contribution. 


bithorax complex One of two homeotic 

gene clusters in Drosophila consisting of three 
genes (Ultrabithorax, abdominal-A, and 
Abdominal-B) that act in combination to spec- 
ify the thoracic and abdominal parasegments. 


blending theory of heredity An obsolete 
theory of heredity proposing that the traits of 
offspring are the average of parental traits. 


blotting (in gel electrophoresis) The pro- 
cess of transferring proteins or nucleic acids 
from an electrophoresis gel to a permanent 
membrane or filter. 


blunt end 5’ or 3’ ends of double-stranded 
DNA lacking any single-stranded overhangs. 


branch point adenine In intron splicing, an 
adenine nucleotide near the 3’ splice site of an 
intron that joins with a guanine located at the 
5’ splice site by a 2'-to-5’ phosphodiester bond 
to form a lariat intron. 

BRCA1-associated genome surveillance 
complex (BASC) A multiprotein complex that 
surveys the genome for mutations at the G,- 
to S-phase cell cycle checkpoint. 


broad sense heritability (H2) The proportion 
of total phenotypic variance that is contrib- 
uted by total genetic variance. 


bulky adduct Large chemical groups added 
to nucleotides by alkylating agents. 


bypass polymerase A group of DNA poly- 
merases that are unstable and synthesize short 
regions of DNA under conditions in which the 
main DNA polymerase is unable to function, 


such as when faced with DNA lesions that 
block replication. Also called translesion DNA 
polymerase. 


CAAT box A common consensus sequence 
component of eukaryotic promoters. 


CAP (catabolite activator protein) In bac- 
terial transcription regulation, binds cAMP 
(cyclic AMP) at low glucose concentrations 
to positively regulate the transcription of op- 
erons that allow the use of alternative carbon 
sources. 


CAP binding site A bacterial DNA regulatory 
sequence to which the CAP-cAMP complex 
binds to positively regulate gene expression. 
See also CAP—cAMP complex. 


CAP-cAMP complex Formed by joining cat- 
abolite activator protein to cAMP, the complex 
binds to the CAP binding site of the bacterial 
lac promoter to regulate gene expression. 


capsid The protein coat of a viral particle. 


catabolite repression Situation where the 
presence of the preferred catabolite (e.g., glu- 
cose) represses the transcription of genes for 
an alternative catabolite (e.g., lactose). 


Cdk (cyclin-dependent kinases) A group 

of multimeric proteins whose levels fluctuate 
during the cell cycle. Composed of cyclin pro- 
teins and protein kinases, Cdks control entry 
and progression through mitosis. 


cell cycle Consisting of interphase (G; phase, 
S phase, and G» phase) and M phase (mitosis 
or meiosis) in cells. The transition from one 
phase to the next is controlled by protein- 
based interactions. 


cellular blastoderm Stage of Drosophila 
embryogenesis in which the nuclei are located 
at the periphery of the embryo and are en- 
closed by cell membranes. 


central dogma of biology The description 
of the functional relationship between DNA, 
RNA and proteins (DNA to RNA to protein). 


centromere A specialized DNA sequence on 
eukaryotic chromosomes that is the site of ki- 
netochore protein and microtubule binding. 


centrosome A cytoplasmic region, contain- 
ing a pair of centrioles in many eukaryotic 
species, from which the growth of microtu- 
bules forms the spindle apparatus during cell 
division. 

character displacement The pattern of natu- 
ral selection in which one phenotypic char- 
acter in a population is displaced by another 
form of the character confers greater relative 
fitness. 


chaperone A category of eukaryotic proteins 
that assist with the folding or movement of 
other polypeptides. A protein that acts as 

a chaperone is often referred to as a 
chaperonin. 


Chargaff’s rule The observation that the 
percentage of adenine equals that of thymine 
and that guanine percentage equals cytosine 
percentage in DNA. 
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charged tRNA A tRNA to which the correct 
amino acid has been attached. 


chiasma (plural: chiasmata) Points of con- 
tact between homologous chromosomes that 
are coincident with crossover locations be- 
tween the homologs. 


chimeric gene A gene sequence composed of 
sequences from two or more sources. 


chi-square test (x? test) A statistical test to 
compare the observed results of an experi- 
ment with the results predicted by chance. 


chloroplast An organelle, bounded by a 
double membrane, where photosynthetic reac- 
tions convert light energy and CO, into fixed 
organic carbon. 


chromatin The complex of nucleic acids and 
proteins that compose eukaryotic chromo- 
somes. 


chromatin modifier Proteins that chemically 
modify histone proteins in the nucleosomes by 
adding or removing specific chemical groups, 
thereby modifying chromatin structure and 
regulating gene expression. 


chromatin remodeler Proteins that reposi- 
tion nucleosomes within chromatin in such a 
way as to open or close promoters and other 
regulatory sequences or that change the com- 
position of nucleosomes, altering their biologi- 
cal activity (e.g., SWI/SNE, ISWI, SWRI). 


chromatin remodeling Processes that modify 
the structure or composition of chromatin. Usu- 
ally associated with alterations of nucleosome 
binding to DNA and affecting the regulation of 
gene transcription. 
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chromatography A technique for separating 
the components of a molecular mixture by 
their similarities and differences. 


chromosome A structure composed of DNA 
and associated proteins that in total contain 
the genome of an organism. 


chromosome aberration An abnormality of 
chromosome number or structure. 


chromosome arm [long arm (q arm), short 
arm (p arm)] The segments of eukaryotic 
chromosomes between the centromere and 
the telomeres. 


chromosome banding A group of laboratory 
methods that stain eukaryotic chromosomes 
to reveal distinctive patterns of light and dark 
bands. Chromosome banding by Giemsa 
staining produced standardized patterns for 
different chromosome of selected species. 
Also known as Giemsa (G) banding. 


chromosome break point The location of a 
chromosome break. 


chromosome fusion See Robertsonian trans- 
location. 


chromosome inversion (paracentric, 
pericentric) A structural alteration of a 
chromosome in which a segment breaks away 
from the chromosome and subsequently reat- 
taches after 180° rotation. See also inversion 
heterozygote. 
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chromosome scaffold Composed of numer- 
ous nonhistone proteins, the superstructure of 
eukaryotic chromosomes. 


chromosome territory The region within a 
nucleus occupied by a particular chromosome 
during interphase. 


chromosome theory of heredity The theory 
developed in the early 20th century that genes 
are carried on chromosomes and that the mei- 
otic behavior of chromosomes is the physical 
basis of Mendel’s laws. 


chromosome translocation The relocation 
of a chromosome or chromosome segment to 
a non-homologous chromosome. 


chromosome walking See positional cloning. 


cis-acting Acting on the same chromosome 
(e.g., DNA sequences that control expression 
of genes encoded on the same piece of DNA). 


cis-acting regulatory sequence Sequences to 
which proteins bind to regulate transcription of 
genes located on the same chromosome as the 
sequences. 


cis-dominant The principle that the operator 
can influence only the transcription of adja- 
cent downstream genes. 


clade In phylogenetics, a group of organisms 
defined by characteristics that are unique to the 
group and distinguish the group from others. 


cladistics The classification of organisms by 
characteristics that are unique to the group 
and distinguish it from other groups. Involves 
branching of new species from ancestral spe- 
cies. See also clade. 


cladogenesis Phylogenetic evolution by 
branching of descendant species from ances- 
tral species. 


clamp loader A multiprotein complex that 
pairs with DNA polymerase and the sliding 
clamp during replication. 


clone-by-clone sequencing An approach to 
genome sequencing where each chromosome 

is first broken into overlapping clones that 

are then arranged in linear order to produce a 
physical map of the genome. Each clone in the 
map is then sequenced separately. Contrast with 
whole-genome shotgun (WGS) sequencing. 


cloning vector A piece of DNA derived from 
a plasmid, virus, or other biological source 
that can be stably maintained in an organism 
and into which heterologous pieces of DNA 
can be inserted. 


closed chromatin Chromatin in which regula- 
tory DNA is covered by nucleosomes, thus re- 
stricting the access of regulatory proteins to the 
sequences rendering genes in closed chromatin 
transcriptionally silent. 


closed promoter complex The initial stage 
of transcription that forms when RNA poly- 
merase loosely binds the promoter. 


coding region The region of a gene that 
encodes the gene product. 


coding strand (nontemplate strand) The 
nontemplate strand of DNA that has the same 


5'-to-3’ polarity as its transcript and the same 
sequence, except for Tin DNA and Y in RNA. 


codominance The equal and detectable 
expression of both alleles in a heterozygous 
organism. 


codon The nucleotide triplet of mRNA that 
encodes a single amino acid. 


codon bias The preferential use of specific 
codons where there is redundancy in encoding 
a specific amino acid. 


coefficient of coincidence (c) The ratio of the 
observed number of double recombinants to 
the number of double recombinants expected 
to occur by chance. 


coefficient of inbreeding (F) The probability 
that two alleles carried in an individual are ho- 
mozygous identical by descent (IBD). 


cohesive compatible end Short, single- 
stranded overhangs at 5’ and 3’ ends produced 
after digestion with certain restriction ends. 
The cohesive ends are termed compatible 

if they can base-pair with complementary 
single-stranded ends of another DNA mol- 
ecule. Compare with cohesive end sequence 
(cos) sites. 


cohesive end sequence (cos) site The single- 
stranded ends of phage lambda that facilitate 
circularization or concatamerization of lambda 
phage genomes and that interact with coat 
proteins during packaging of phage particles. 
Compare with cohesive compatible ends. 


cointegrate In replicative transposition, the 
fusion of two circular transposable elements 
into a single, larger circular element. 


comparative genomics See evolutionary 
genomics. 


compensatory mutation A second mutation 
occurring at another site that fully or partially 
restores wild-type function lost when an initial 
mutation occurs. 


complementary base pairs The specific pat- 
tern of purine-pyrimidine pairing of nucleic acid 
strands. In DNA, G with c and A with T; RNA 
uses U instead of T. 


complementary DNA (cDNA) library 
Collection of DNA clones, originally derived 
via reverse transcription of mRNA molecules 
into DNA (cDNA) and cloned into a vector. 


complementary gene interaction (9:7 
ratio) A characteristic ratio of phenotypes 
produced by the interaction of two comple- 
mentary genes that control a trait. 


complementation group A group of muta- 
tions that affect the same gene. 


complete genetic linkage The absence of 
crossing over between linked genes. 


complete initiation complex The multisubu- 
nit complex that forms at the promoter imme- 
diately before the onset of transcription. 


complete penetrance The observation that 
the phenotype for a trait is always produced 

when the corresponding genotype(s) are pre- 
sent (in contrast, see incomplete penetrance). 


composite transposon In bacteria, a trans- 
posable element containing multiple genes 
located between terminal insertion sequences. 


concordance In twin studies, the observation 
that both twins exhibit the trait. 


conditional probability A probability pre- 
diction that is dependent on another previous 
event having taken place. 


conjugation The short-term union of two 
bacterial cells for the unidirectional transfer of 
DNA from the “donor” to the “recipient.” The 
transferred material may be plasmid DNA or 
donor bacterial chromosome DNA. 


conjugation pilus The hollow filament 
extending from the donor bacterium to the 
recipient bacterium through which DNA is 
transferred. Also known as conjugation tube. 


conjugation tube See conjugation pilus. 
consanguineous mating See inbreeding. 


consensus sequence A nucleotide sequence 
in a DNA segment derived by comparing se- 
quences of similar segments from other genes 
or organisms. The most commonly occurring 
nucleotides at each position comprise the 
sequence. 


conservative DNA replication A disproven 
model of DNA replication positing that one 
duplex produced by replication contained the 
two original strands and the other two daugh- 
ter strands. 


conserved noncoding sequence (CNS) 
Sequences that do not code for amino acids 
and are conserved across significant phyloge- 
netic distances. 


constitutive heterochromatin Chromosome 
regions containing chromatin that is always 
densely compacted. Usually containing highly 
repetitive DNA sequences. 


constitutive mutants Mutants in which a 
gene is always expressed rather than being 
under regulatory control. 


constitutive transcription State in which a 
gene is continuously transcribed. 


contiguous sequence (contig) Overlapping 
DNA clones that together cover an uninter- 
rupted continuous stretch of DNA sequence. 


continuous variation In polygenic and mul- 
tifactorial traits, the observation of phenotypic 
distribution over a continuous range. 


controlled genetic cross Genetic crosses 
controlled by an investigator who usually 
knows the genotypes and/or phenotypes of the 
organisms being crossed. 


convergent evolution Processes of inde- 
pendent evolution of similar structures in un- 
related species. Also known as homoplasmy. 


co-option A common theme in the evolu- 
tionary history of genes by which genes and 
genetic modules are reused in a new manner 
to direct the patterning or growth of novel 
organs. 


coordinate gene Genes, often with maternal 
effects, that establish the major axes of the 


embryo, especially the anterior-posterior and 
dorsal-ventral axes; examples include bicoid 
and nanos. 


copy number variant (CNV) A specific type 
of structural variant due to insertions or dele- 
tions (indels) greater than 1 kb in length. 


core DNA The approximately 146 base 
pairs of eukaryotic DNA that wrap each 
nucleosome. 


core element Consensus sequences in the ac- 
tive regions of promoters recognized by RNA 
polymerase I. 


corepressor An accessory molecule required 
for a repressor protein to exert its function. 


cosmid vector Cloning vector used in bac- 
teria that utilizes phage lambda cos sites for 
packaging of phage and a bacterial origin of 
replication for subsequent maintenance in 
bacteria; can accept DNA inserts of up to 
40 kb. 


cosuppression The silencing, via a small 
RNA mediated mechanism, of an endogenous 
gene due to the presence of a homologous 
transgene or virus. Cosuppression can occur 
at the transcriptional or post-transcriptional 
level. 


cotransduction The simultaneous trans- 
duction of two or more genes contained on 
a donor DNA fragment into a recipient cell, 
where it undergoes homologous recombina- 
tion to be spliced into the transductant 
chromosome. 


cotransduction frequency The frequency 
with which two genes are transduced. 


cotransduction mapping A method of map- 
ping donor bacterial genes based on their 
frequency of cotransduction. 


cotransformation Simultaneous transforma- 
tion of two or more genes carried on a donor 
DNA fragment into a recipient. 


covered promoter Promoter in which nucle- 
osomes are found adjacent to the transcription 
start site, preventing efficient transcription 
initiation. This feature is common at highly 
regulated genes. 


CpG dinucleotide See CpG island. 


CpGisland Region in which the frequency of 
CpG dinucleotides is higher than the average 
for the genome; commonly found near the 
transcription start sites of animal genes. The 
cytosines are often methylated when the gene 
is inactive and demethylated when the gene is 
transcriptionally active. 


crossing over The breakage and reunion of 
homologous chromosomes that results in re- 
ciprocal recombination. 


crossover suppression The significant 
reduction, or complete absence, of progeny 
with recombinant chromosomes due to du- 
plications and deletions of genetic material 
following crossing over within the inversion 
loop in organisms that are heterozygous for an 
inversion. 


cryptic splice site A 5’ or 3’ splice site that 
is not normally used except when a mutation 
either inactivates an authentic splice site or 
creates a new splice site at the cryptic site lo- 
cation. See also splicing mutation. 


cyanobacteria Lineage of photosynthetic 
bacteria that are the closest extant relatives of 
the lineage that gave rise to the plastids. 


cyclin protein A family of proteins whose lev- 
els fluctuate during the cell cycle. Cyclins pair 

with protein kinases to form cyclin-dependent 
kinases (Cdks) that help regulate the cell cycle. 


cytokinesis Part of telophase, the process of 
cytoplasmic division between daughter cells. 


cytological markers Structural differences 
between homologous chromosomes that serve 
to differentiate the chromosomes when they 
are visualized using microscopy. 


cytosine (C) One of four nitrogenous nucleo- 
tide bases in DNA and RNA; one of the two 
types of pyrimidine nucleotides in DNA and 
RNA. 


daughter cell The genetically identical cells 
produced by mitotic cell division. 


daughter strand A newly synthesized strand 
of DNA that is complementary to a template 
strand. 


deamination A DNA lesion resulting in the 
loss of an amino group (NH3) from a nucleo- 
tide base. 


degrees of freedom (df) The number of in- 
dependent variables in an experiment. In a chi- 
square test, most often the number of outcome 
class minus 1 (n — 1). 


delayed age of onset The appearance of an 
abnormal phenotype that is not present at 
birth but appears later in life and is caused by 
an inherited mutation. 


delayed early genes A group of genes in À 
(lambda) bacteriophage that are expressed 
following expression of the early genes that 
initiate the lytic cycle. 


deletion The loss of genetic material. See also 
interstitial, microdeletion, partial deletion, 
partial deletion heterozygote, and terminal 
deletion. 


deletion mapping A method for mapping 
genes utilizing partial chromosome deletions 
with known locations to expose recessive 
mutants by pseudodominance. 


denaturation In DNA, the separation of 
complementary strands of nucleic acids by 
hydrogen bond breakage. In polypeptides and 
proteins, the unfolding of tertiary or quater- 
nary structures. 


densitometry A technique for passing light 
through an electrophoresis gel to detect the 
presence of a stained band of protein or 
nucleic acid. 


deoxynucleotide 5'-monophosphate 
(dNMP) Monophosphate forms of deoxynu- 
cleotides. 
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deoxynucleotide 5’-triphosphate (dNTP) 
Triphosphate forms of deoxynucleotides. 


deoxyribonucleic acid (DNA) The hereditary 
molecule of organisms. Composed of two 
complementary strands of nucleotides with 
purine bases adenine (A) and guanine (G) 

and pyrimidine bases thymine (T) and 
cytosine (C). 

depurination A DNA lesion occurring when 
a deoxyribose molecule loses its purine nu- 
cleotide base. See apurinic (AP) site. 


dicentric bridge Ina dicentric chromosome, 
the portion between the two centromeres that 
are drawn to opposite poles of the cell during 

division. 

dicentric chromosome A chromosome with 
two centromeres. 


dicer Ribonuclease that acts on double- 
stranded RNA responsible for the generation 
of small regulatory RNA molecules, such as 
microRNAs and small interfering RNAs; 
typically 21-30 nucleotides in length. 


dideoxy DNA sequencing A method of DNA 
sequencing devised by Fred Sanger that uses 

a mixture of deoxynucleotide and dideoxynu- 
cleotide triphosphates to selectively block DNA 
replication, producing a ladder of partially syn- 
thesized DNA strands of different lengths. Also 
known as the Sanger method. 


dideoxynucleotide triphosphate 

(ddNTP) Rare DNA nucleotides absent oxy- 
gen molecules at the 2’ and the 3’ carbons that 
are most commonly used in dideoxynucleotide 
DNA sequencing. 
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differential reproduction See relative 
fitness (w). 

differentiation Process by which cells become 
restricted in their developmental potential and 
take on specialized morphologies and physi- 
ological activities. 


dihybrid cross A cross between organisms 
that are heterozygous for two loci. 


diploid number The characteristic number 
of chromosomes (2n) in somatic cell nuclei 
during the diploid phase of the eukaryotic life 
cycle. Equal to twice the haploid (n) number of 
chromosomes found in the nuclei of gametes 
of sexually reproducing diploid species. 


directed assembly A precess for the assem- 
bly of viral particles that is directed by non- 
capsid proteins. 


directional cloning Technique whereby a 
DNA insert is cloned with a specific directional- 
ity with respect to sequences of the cloning vec- 
tor; usually accomplished by using two different 
restriction enzymes. 


directional natural selection See directional 
selection. 


directional selection Natural or artificial 
selection that continuously changes the fre- 
quency of an allele in a direction toward fixa- 
tion (frequency = 1.0) or toward elimination 
(frequency = 0.0). 
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discontinuous variation A phenotype 
distribution containing discrete or separable 
categories. 


discordance In twin studies, the observation 
that the traits exhibited by the twins are dif- 
ferent. 


disjunction The normal process separation of 
homologous chromosomes or of sister chro- 
matids during cell division. 


dispersive DNA replication A disproven 
model of DNA replication positing that each 
strand of daughter duplexes is composed of 
segments of original DNA and segments of 
newly synthesized DNA. 


displacement loop (D loop) During DNA 
damage repair and homologous recombina- 
tion, the displacement of a single strand of 
DNA by strand invasion. 


disruptive selection Natural or artificial 
selection of phenotypic extremes in a popula- 
tion, leading eventually to two strains with 
distinctive phenotypes. 


dissociation (Ds) element In transposition, a 
non-autonomous genetic element that is inca- 
pable of transposing on its own. 


DNA-binding protein A general term for a 
protein that binds to DNA; the interaction can 
be either DNA-sequence specific (most regu- 
latory proteins) or DNA-sequence nonspecific 
(e.g., structural proteins such as histones). 


DNA clone A fragment of DNA that is 
inserted into a vector, such as a plasmid, 
cosmid, or artificial chromosome. 


DNA double helix (DNA duplex) The two 
complementary strands of DNA arranged in 
antiparallel orientation. 


DNA gyrase (topoisomerase II) A member of 
the class of DNA replication enzymes known 
as topoisomerases that assist with the unwind- 
ing of supercoiled DNA. 


DNA intercalating agents Mutagenic com- 
pounds of a size and shape that allow them to 
access to the space between nucleotide base 
pairs, thereby distorting the DNA duplex 
and potentially causing insertion or deletion 
mutations. 


DNA library Collection of DNA clones in 
which the DNA is usually derived from a 
single source. 


DNA ligase An enzyme active in DNA repli- 
cation that joins together segments of a DNA 
strand by catalyzing formation of a phospho- 
diester bond. 


DNA loop In gene regulation, a condition 
where the DNA sequences between regulatory 
elements form an extended loop that allows 
distant regulatory sequences with associated 
DNA-binding proteins to interact. 


DNA microarray Collections of synthesized 
DNA fragments attached to a solid support and 
representing sequences present in a genome; 
can be used to assess transcription patterns, 


transcription factor binding sites, and recombi- 
nation patterns, among other uses. 


DNA nucleotides DNA building blocks 
composed of deoxyribose sugar, a nitrogenous 
base, and one or more phosphate groups. See 
also adenine (A), thymine (T), cytosine (C), and 
guanine (G). 

DNA polymerase (pol I, pol Il, pol III) The 
large multisubunit complex responsible for the 
synthesis of new strands of DNA during DNA 
replication or DNA repair. 


DNA proofreading The capacity of many 
types of DNA polymerase to utilize a 3’ to 5' 
exonuclease activity to remove and replace mis- 
matched or damaged nucleotides during repli- 
cation. See also 3' to 5’ exonuclease activity. 


DNA replication The synthesis of new DNA 
strands by complementary base pairing of 
nucleotides in a daughter strand to those ina 
template strand. 


DNA supercoiling (negative supercoiling, 
positive supercoiling) Superhelical twisting 
of DNA causing overwinding (positively super- 
coiled) or underwound (negatively supercoiled) 
of the molecule. Supercoiling is most promi- 
nent in circular chromosomes where it plays a 
role in normal chromosome packaging in cells. 


DNA transposon One type of transposable 
genetic element encoding a transposase and 
capable of transposition. 


DNA triplet Three DNA nucleotides corre- 
sponding to a codon of mRNA. 


DNase | hypersensitive site Regions of chro- 
matin sensitive to cleavage by DNase I; these 
often represent open chromatin that is tran- 
scriptionally active. 


dominance variance In polygenic and mul- 
tifactorial inheritance, the portion of genetic 
variance attributed to the dominance effects of 
contributing genes. 


dominant epistasis (12:3:1 ratio) A char- 
acteristic ratio of phenotypes produced by 
the interaction of two genes that control a 
trait in which a dominant allele of one gene 
masks or reduces the expression of alleles of 
a second gene. 


dominant interaction (9:6:1 ratio) A char- 
acteristic ratio of phenotypes produced by the 
interaction of two genes that control a trait in 
which the presence of dominant alleles of both 
genes produces one phenotype, one domi- 
nant allele of either gene produces a second 
phenotype, and organisms with only recessive 
alleles for the interacting genes have a third 
phenotype. 


dominant negative mutation A dominant 
mutation that behaves as a loss-of-function, 
often due to blocking the formation or normal 
function of a multimeric protein complex. 


dominant phenotype The phenotype 
observed in a heterozygous organism that 

is identical to the phenotype observed in a 
homozygote. The phenotype produced when 


an organism is homozygous for the dominant 
allele or carries a single copy of the dominant 
allele in the heterozygous genotype. Compare 
with recessive phenotype. 


dominant suppression (13:3 ratio) A char- 
acteristic ratio of phenotypes produced by the 
interaction of two genes that control a trait in 
which the dominant allele of one gene sup- 
presses the expression of the dominant allele 
of the second gene. 


donor cell (bacterial donor) The bacterial 
cell that is the source of DNA transferred to a 
recipient cell by either conjugation, transduc- 
tion, or transformation. 


donor DNA DNA to be used in cloning or 
other recombinant DNA technologies. 


dosage compensation A mechanism for 
equalizing the expression of X-linked genes in 
males and females of a species. 


double Holliday junction (DHJ) An inter- 
mediate structure temporarily connecting 
chromatids of homologous chromosomes that 
forms during homologous recombination. 


double recombinant (double crossover) 
The occurrence of two crossovers between 
homologous chromosomes in a particular 
region. May involve two, three, or all four 
chromatids. 


double-strand break repair Following phos- 
phodiester bond breakage on both strands of 
a DNA duplex, a mechanism of DNA damage 
repair. Related to the mechanism for homolo- 
gous recombination. 


downstream Referring to a gene or sequence 
location that is toward the 3’ direction on the 
coding strand. 


drosha Ribonuclease that processes pri- 
microRNA molecules into pre-microRNA 
molecules in animals. 


duplicate gene action (15:1 ratio) A char- 
acteristic ratio of phenotypes produced by the 
interaction of two genes that duplicate each 
other’s action due to genetic redundancy. 


duplication The gain of genetic material by 
the inclusion of one or more additional copies 
of a chromosome segment. See also micro- 
duplication, partial duplication, and partial 
duplication heterozygote. 


early operators The operator sequence in 
the genome of bacteriophage ) (lambda) that 
controls transcription of early genes. See also 
delayed early genes and late genes. 


early promoters Regulatory sequences re- 
sponsible for the activation of early genes or 
operons in bacteriophage. See also delayed 
early genes and late genes. 


east-west (EW) resolution One of the pos- 
sible patterns for resolving a Holliday junction 
to separate homologous chromosomes before 
meiotic anaphase. 


electrophoretic mobility A measurement of 
(1) the distance of migration or (2) the speed 


of migration of a nucleic acid or protein in gel 
electrophoresis. 


elongation factor (EF) A group of proteins 
associated with ribosomes that contribute to 
the elongation of the polypeptide product. 


embryonic stem cell In vertebrates, totipo- 
tent cells of early embryos that can give rise to 
any and all cell types of the organism. 


endosymbiont An organism that lives within 
the body or cell of another organism. 


endosymbiosis An (often) mutually benefi- 
cial relationship between organisms in which 
one organism, the endosymbiont, inhabits the 
body of the other. 


endosymbiosis theory Hypothesis that the 
mitochondrion and chloroplast are evolution- 
arily derived from bacterial endosymbionts 
related to extant a-proteobacteria and cyano- 
bacteria, respectively. 


enhanceosome Protein complex that binds 
enhancer elements and directs DNA bending 
into loops that bring the protein complex into 
contact with RNA polymerase and transcrip- 
tion factors bound at the core promoter or with 
protein complexes bound to proximal promoter 
elements. 


enhancer A eukaryotic cis-acting DNA regu- 
latory sequence to which trans-acting factors 
bind and stimulate transcription. See also 
enhancer sequence. 


enhancer screen A genetic screen designed 
to identify mutations in genes that worsen the 
phenotypic effects of mutations in another 
gene. 


enhancer sequence Sets of regulatory se- 
quences that bind specific transcriptional pro- 
teins that can elevate transcription of targeted 
eukaryotic genes. 


enhancer trapping A transgenic construct 
inserted randomly into the genome that allows 
identification of enhancer elements controlling 
specific patterns of gene expression. 


enveloped virus A viral particle coated with 
cytoplasmic material derived from the infected 
host organism. 


environmental variance (Vg) For quantita- 
tive traits, the proportion of the total pheno- 
typic variance contributed by differences in 
the environment experienced by population 
members. 


epigenetic Heritable patterns or changes in 
gene expression that are not associated with 
any change in DNA sequence. 


epigenetic marks A collection of chemical 
marks and modifications, such as acteylation 
and methylation of histone proteins, that are 
functional in chromatin remodeling. Also 
known as epigenetic modification. 


epigenetic modification Chemical modifi- 
cations of DNA or associated histones, such 
as acetylation and methylation, that alter 
chromatin structure and influence gene 
transcription. 


episome A bacterial plasmid that is able to 
both replicate autonomously and integrate 
into the host genome. 


epistasis See epistatic interaction. 


epistatic interaction A group of specific 
patterns of gene interaction in which an allele 
of one gene modifies or prevents the expres- 
sion of alleles of another gene. Also known as 
epistasis. 

equilibrium frequency The stable frequency 
of an allele in a population attained and main- 
tained through the action of evolutionary 
processes. 


eraser A chromatin-modifying enzyme that 
removes chemical groups from chromatin 
(e.g., methyl or acetyl groups from the lysines 
of histone 3). 


ethidium bromide (EtBr) A compound used 
to stain DNA and RNA in electrophoresis gels. 


euchromatic region See euchromatin. 


euchromatin Chromosome regions contain- 
ing chromatin that is not densely compacted. 
Most expressed genes are located within 
euchromatic regions of chromosomes. Also 
known as euchromatic region. 


Eukarya One of the three domains of life; 
separate from Archaea and Bacteria. See also 
eukaryote. 


eukaryote Referring to organisms belonging 
to the domain Eukarya. 


eukaryotic expression vector A vector 
that contains all the necessary cis-regulatory 
sequences to enable gene expression in a 
eukaryotic cell. 


eukaryotic initiation factor (elF) A group of 
eukaryotic proteins that associate with riboso- 
mal subunits and help initiate translation. 


euploid A number of chromosomes that is an 
exact multiple of the haploid number. 


E(var) mutation Mutations that enhance 
position effect variegation in Drosophila. 
Mutated genes produce proteins that are active 
in chromatin remodeling. 


evo-devo The study of the evolution of 
development. 


evolution (1) Any change in the genetic char- 
acteristics of a population, strain, or species 
over time. (2) The theory that all organisms 
are related by common ancestry and have di- 
versified from common ancestors over time. 


evolutionary genetics The study of evolu- 
tion and evolutionary processes using genetic 
techniques and tools. 


evolutionary genomics The comparison of 
genomes, both within and between species. It 
illuminates the genetic basis of similarities and 
differences between individuals or species. 


evolutionary processes Four processes— 
natural selection, migration, mutation, and 
random genetic drift—that can cause changes 
in the genetic characteristics of a population 
or lineage. 
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exconjugant cell The cell that is the product 
of conjugation between a donor cell anda 
recipient cell. 


exit site (E site) On the ribosome, the site 
through which an uncharged tRNA exits. 


exon A nonintron segment of the coding 
sequence of a gene. Joined together follow- 
ing intron splicing, exons correspond to the 
mRNA sequence that is translated into a 
polypeptide. 

exonic and intronic splicing enhancers (ESE 
and ISE) Exon and intron sequences that play 
a role in stimulating intron splicing. 


exonic and intronic splicing silencers (ESS 
and ISS) Exon and intron sequences that play 
a role in suppressing intron splicing 


expression array DNA microarray that carries 
unique sequences from every annotated gene of 
the genome and is used to monitor gene expres- 
sion patterns. 


expression vector Cloning vector possessing 
DNA sequences required for DNA fragments 

inserted into the vector to be transcribed and 

translated. Vectors with sequences facilitating 
expression in eukaryotes are called eukaryotic 
expression vectors. 


F (fertility) factor The plasmid containing 
genes that confer the ability to act as a donor 
cell on a bacterium. May be either an extra- 
chromosomal plasmid or may be incorporated 
into the donor bacterial chromosome. Also 
known as F plasmid. 
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F plasmid See F (fertility) factor. 


F* cell (F* donor) A donor bacterium con- 
taining an extrachromosomal fertility plasmid. 


F (F cells) A bacterial recipient cell; does 
not contain an F (fertility) plasmid. 


F’ cell (F’ donor) A bacterial donar cell har- 
boring an F (fertility) plasmid. 


F' factor An extrachromosomal fertility plas- 
mid into which a portion of the donor bacte- 
rial chromosome has been incorporated. 


F, generation (first filial generation) The 
first generation of offspring. In genetic ex- 
periments, usually the offspring produced by 
crossing pure-breeding parents. 


F generation (second filial generation) The 
second generation, produced by crossing F1 
organisms. 


F3 generation (third filial generation) The 
third generation, produced by crossing Fy 
organisms. 


facultative heterochromatin Heterochro- 
matic chromosome regions whose level of 
compaction can vary. Often contains repetitive 
DNA, but may also contain some expressed 
genes. 


first-division segregation In Neurospora and 
other organisms forming an ascus, the separa- 
tion of alleles at the first meiotic division due 
to no crossing over having occurred. 
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flanking direct sequence repeat Identical 
repetitive sequences flanking the sites of inser- 
tion of transposable genetic elements. 


fluorescent in situ hybridization (FISH) A 
laboratory method for identifying genes or 
DNA sequences using molecular probes la- 
beled with a compound that can emit fluores- 
cent light upon excitation. 


forked-line diagram A method for dia- 
gramming the probabilities of outcomes in a 
branching format. 


forward genetic analysis The classical ap- 
proach to genetic analysis whereby genes are 
first identified by mutant phenotypes caused 
by mutant alleles and the gene sequence 

is subsequently identified by recombinant 
DNA technologies. Also known as forward 
genetics. 


forward genetics See forward genetic analysis. 


forward mutation A mutation that alters a 
wild type and generates a mutant. Also known 
as mutation. 


forward mutation rate (p) The frequency 
of mutation from wild-type alleles to mutant 
alleles. 


founder effect The random occurrence of 
allele and genotype frequency differences be- 
tween a new population established by a small 
number of founders and the larger parental 
population. 


four-strand double crossover Two crossover 
events between a pair of homologous chromo- 
somes that involve all four chromatids. 


frameshift mutation The insertion or dele- 
tion of DNA base pairs resulting in translation 
of mRNA in an incorrect reading frame. 


frequency distribution A visual display or 
histogram of quantitative data. 


functional domain A protein region with a 
specific function or interaction. 


functional genomics Using genomic se- 
quences and genome-wide patterns of tran- 
scripts and protein expression to understand 
gene function in an organism. 


functional RNA Various types of transcripts 
that are not translated and are functional as 
nucleic acids. See also tRNA, rRNA, snRNA, 
miRNA, siRNA, and ribozyme. 


fusion gene A recombinant gene composed 
of DNA sequences from more than one source 
(e.g., the codon sequences derived from one 
gene and the sequences responsible for expres- 
sion derived from a second gene). 


fusion protein A recombinant protein en- 
coded by DNA sequences from more than one 
source; made by combining the open reading 
frames of two unrelated genes. 


Go phase The “G zero” phase of the cell cycle, 
an alternative to G4 of the cell cycle entered by 
mature cells that generally do not divide again 
until they die. Compare with G, phase and 

Gə phase. 


G, phase The “Gap 1” phase of the cell cycle 
during which genes are actively transcribed 
and translated and cells carry out their normal 
functions. Compare with Gg phase and 

Gə phase. 


G, phase The “Gap 2” phase of the cell cycle 
during which the cell prepares to divide. Com- 
pare with Go phase and G; phase. 


gain-of-function mutation A mutation caus- 
ing a gene to be overexpressed, to be expressed 
at the wrong time, or to encode a constitu- 
tively acting protein. Usually inherited as a 
dominant mutation. 


gamete The reproductive cells produced 
by male and female reproductive structures; 
sperm or pollen in male animals and plants 
and eggs in females. 


gap gene In Drosophila, genes that control 
development in large contiguous regions along 
the anterior-posterior axis; examples include 
hunchback, giant, Kriippel, and knirps. 


Gaussian distribution See normal 
distribution. 


gel electrophoresis A laboratory method for 
separating proteins or nucleic acid molecules 
or fragments using electrical current in a gel 
matrix. 


gene The physical unit of heredity, composed 
of a DNA sequence that is transcribed and 
encodes a polypeptide or another functional 
molecule. 


gene conversion Repair of mismatched 
(non-complementary) DNA nucleotides in 
heteroduplex DNA that forms during meiotic 
recombination. One allele is switched for 
another allele already in the genotype. 


gene dosage The number of copies of a gene. 


gene-environment interaction Interactions 
taking place between particular genes and spe- 
cific environmental factors. 


gene family A group of genes that is evolu- 
tionarily related via successive gene duplica- 
tion events that are followed by diversification. 


gene flow The movement of genes into, out 
of, or between populations as a consequence 
of the movement of organisms. See also 
migration. 


gene interaction Referring to genes that in- 

teract with one another due to their participa- 
tion in the production of a particular product 
or trait. 


gene knockout Loss-of-function allele of a 
gene usually obtained via a reverse genetic 
approach. 


gene pool The total of all alleles present in 
breeding members of a population at a given 
moment. 


gene therapy The use of genes as therapeutic 
agents to cure or alleviate symptoms of a 
genetic disease. 


general transcription factors (GTFs) 
Eukaryotic transcription-activating proteins 


that bind the promoter region to form part of 
the apparatus that initiates basal transcription. 


generalized transducing phage In trans- 
duction, a bacteriophage that carries a random 
segment of the chromosome of a donor cell to 
the recipient cell. 


generalized transduction The transduction 
of a random segment of a donor chromosome 
into a recipient cell by a transducing phage. 
See also generalized transducing phage. 


genetic bottleneck A period or event char- 
acterized by a substantial random reduction in 
population size. Loss of genetic diversity and 
allele frequency changes usually occur. 


genetic chimera A tissue or organism 
composed of cells of two or more distinct 
genotypes. 


genetic code The universal set of corre- 
spondences of mRNA codons to amino acids. 
Used in translation to synthesize polypeptides. 


genetic complementation (1) The observa- 
tion of a wild-type phenotype in an organism or 
cell containing two different mutations. (2) The 
cross of two pure-breeding mutants that yields 
progeny that are exclusively wild type. 


genetic dissection The use of mutations and 
recombinants in genetic analyses to identify 
and assemble the genetic components of a bio- 
logical property or process. 


genetic drift A process of evolution referring 
to random changes in allele frequencies that 
result from sampling errors. Occurs in all pop- 
ulations but is strongest in small populations. 


genetic fine structure The method of high- 
resolution analysis of intragenic recombina- 
tion to map genes at the nucleotide level. 


genetic heterogeneity The observation of 
the same phenotype produced by mutation of 
any one of two or more different genes. 


genetic hitchhiking The phenomenon in 
which specific alleles of genes that are closely 
linked to a gene undergoing positive natural 
selection have their frequencies increased by 
virtue of their presence on the same chromo- 
somes as the favored allele. 


genetic liability See threshold of genetic 
liability. 

genetic linkage The result of genes being 
located so near one another on a chromosome 
that their alleles do not assort independently. 
Identified by detecting certain pairs of alleles 
(parentals) that are transmitted together sig- 
nificantly more often than expected by chance 
and of other pairs of alleles (nonparentals or 
recombinants) that are transmitted together 
significantly less often than expected. 


genetic linkage mapping Process for creat- 
ing maps of genes based on their linkage 
relationships to other genes. 


genetic markers Alleles of either expressed 
genes or noncoding chromosomal regions 
identifying a specific region of a chromo- 
some. Can be used to trace or identify another 


gene, the chromosome, or a cell, organ, or 
individual. 


genetic network Set of interacting genes 
identified from double mutants or other analy- 
ses indicating gene interaction. 


genetic redundancy The situation where the 
functions of one gene are compensated for by 
the actions of another gene. 


genetic screen A procedure whereby a popu- 
lation of organisms is mutagenized and their 
progeny are propagated and examined for mu- 
tant phenotypes. Also known as mutagenesis. 


genetic variance (Vg) In polygenic and mul- 
tifactorial inheritance, the proportion of total 
phenotypic variance contributed by genetic 
variation. 


genome The entire complement of DNA se- 
quences in a chromosome set of an organism. 


genome-wide association studies 

(GWAS) Association analysis performed using 
genetic marker genes distributed throughout 
the genome. Designed to locate genes that may 
influence the variation of quantitative traits. 


genomics The study of the structure, fuction, 
composition, and evolution of genomes. 
genomicimprinting Epigenetic phenomena 
that create differential expression of alleles 
depending on whether they were maternally or 
paternally inherited. 


genomic island Genome segments that dif- 
fer in sequence makeup from the surrounding 
genome sequence. Often these are a conse- 
quence of lateral gene transfer. 


genomic library A set of clones consisting 
of the DNA representing the genome of an 
organism. 


genotype (1) The genetic composition of 
an organism or a cell (i.e., all the alleles of all 
the genes). (2) The alleles of a single gene or a 
specified set of genes in a cell or organism. 


genotype proportion method A method for 
estimating allele frequencies in a population 
by manipulation of genotype frequencies. 


genotypic ratio (1:2:1 ratio) (1) A ratio or 
set of relative proportions between organisms 
with different genotypes. (2) The ratio of 

1/4 : 1/2: 1/4 observed among the homozy- 
gous and heterozygous F, progeny ofa 
monohybrid cross. 


germinal gene therapy Gene therapy aimed 
at correcting the genetic defect in the germ 
cells, such that progeny would not inherit the 
genetic defect. 


germ-line cell See gamete. 
Giemsa (G) banding See chromosome banding. 
Goldberg-Hogness box See TATA box. 


green fluorescent protein (GFP) A gene, de- 
rived from the jellyfish Aequoria victoria, that 
is the source of the natural bioluminescence 
of this species, fluorescing green (a 509-nm 
wavelength) when illuminated with UV light 
(a 395-nm wavelength). When used as a re- 
porter gene, GFP allows a noninvasive means 


of visualizing gene and protein expression 
patterns in living organisms. 


guanine (G) One of four nitrogenous nucleo- 
tide bases in DNA and RNA; one of the two 
types of purine nucleotides in DNA and RNA. 


guide RNA (gRNA) In RNA editing, the nu- 
cleic acid that directs the addition or removal 
of nucleotides from mRNA. Also known as 
guide strand. 


guide strand See guide RNA (gRNA). 


gynandromorphy A condition in which the 
body of an organism is mosaic, appearing to 
contain both male and female features. 


hairpin structure See stem loop. 


haploid Possessing a single set of chromo- 
somes (n); a cell or organism that possesses 
one-half the number of chromosomes found in 
diploid cells of the organism. 


haploid number The number of chromo- 
somes (n) typically found in nuclei during 
the haploid phase of the eukaryotic life cycle. 
One-half the diploid (21) number. 


haploinsufficient A wild-type allele that is 
unable to support wild-type function in a het- 
erozygous genotype. Classified as a recessive 
wild-type allele. Compare with haplosufficient. 


haplosufficient A wild-type allele that sup- 
ports wild-type function in heterozygous 
organisms. Classified as a dominant wild-type 
allele. Compare with haploinsufficient. 


haplotype The specific array of alleles en- 
coded by linked genes in a segment of a single 
chromosome. 

Hardy-Weinberg equilibrium The popula- 
tion genetic principle that in a population 
practicing random mating and in the absence 
of natural selection, mutation, migration, or 
random genetic drift, allele frequencies are 
stable at frequencies p + q = 1.0 for two alleles 
and are distributed into genotypes at frequen- 
cies p2, 2pq, and q2. 


helicase In DNA replication, the enzyme 
responsible for breaking hydrogen bonds be- 
tween complementary nucleotides of a DNA 
duplex. Unwinding of the strands occurs ahead 
of the advancing replication fork. 


helix-turn-helix (HTH) motif A DNA-binding 
protein domain consisting of two alpha helices: 
one helix binds to a specific DNA sequence, 
and the second helix stabilizes the interaction. 


hemizygous Referring to the genotype of 
males that carry a single copy of each X-linked 
gene. 


hemoglobin (Hb) A globin protein com- 
posed of four polypeptides (two a-globin and 
two B-globin) found in blood that transports 
oxygen. 


heritability See broad sense heritability and 
narrow sense heritability. 


heterochromatin A chromosome region 
containing densely compacted chromatin and 
few, if any, expressed genes. See constitutive 
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heterochromatin and facultative heterochro- 
matin. Also known as heterochromatic region. 


heteroduplex DNA A DNA duplex created 
during homologous recombination by com- 
bining complementary strands of DNA from 
nonsister chromatids. Also known as heter- 
oduplex region. 


heteroplasmic cell or organism A cell or or- 
ganism that harbors a mixture of alleles of an 
organellar gene. Also known as heteroplasmy. 


heteroplasmy See /eteroplasmic cell or 
organism. 


heterozygous advantage In evolution, the 
greater relative fitness of heterozygous organ- 
isms compared with homozygous organisms 
in a population. May result in a balanced 
polymorphism. 

heterozygous genotype A diploid genotype 
characterized by the presence of two different 
alleles of a gene. 


Hfr cell An abbreviation for “high frequency 
recombination,’ pertaining to Hfr chromosomes 
or to Hfr donors in bacterial conjugation. 


Hfr chromosome See Hfr donor. 


Hfr donor A donor bacterial strain containing 
an F factor integrated into its chromosome. 
Also known as Hfr chromosome. 


histone acetyltransferase (HAT) Chromatin- 
modifying enzyme that adds acetyl groups to 
specific positively charged amino acids (e.g., 
lysine) in the N-terminal tails of histones. 
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histone deacetylase (HDAC) Chromatin- 
modifying enzyme that removes acetyl groups 
to specific positively charged amino acids (e.g., 
lysine) in the N-terminal tails of histones. 


histone demethylase (HDMT) Chromatin- 
modifying enzyme that removes methyl 
groups to specific positively charged amino 
acids (e.g., lysine) in the N-terminal tails of 
histones. 


histone methyltransferase (HMT) 
Chromatin-modifying enzyme that adds 
methyl groups to specific positively charged 
amino acids (e.g., lysine) in the N-terminal 
tails of histones. 

histone protein (H1, H2A, H2B, H3, H4) 
Five proteins encoded by a gene family that 
form octameric nucleosomes (H2A, H2B, 
H3, and H4) and adhere to DNA to condense 
chromatin (H1). 


Holliday junction A DNA structure that 
forms during meiotic recombination in which 
single strands are crossed over between nonsis- 
ter chromatids of homologous chromosomes. 


Holliday model Proposed originally by Robin 
Holliday; a model intended to explain meiotic 
recombination at a molecular level. 


holoenzyme A fully functional multisubunit 
protein complex in bacteria, for example, the 
RNA polymerase holoenzyme. 


homeobox A conserved sequence of DNA 
of 180 nucleotides encoding a homeodomain 
composed of three a-helices in a family of 
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transcription factors found throughout eu- 
karyotes; in metazoans some genes with 
homeobox genes are homeotic genes. 


homeodomain A 60-amino acid DNA- 
binding domain. 

homeotic gene Gene that controls the de- 
velopmental fate of a region of the body of an 
organism; examples include the Hox genes in 
metazoans and the MADS-box genes in flow- 
ering plants. 


homeotic mutation Mutation in which an 
apparently normal organ or body part devel- 
ops in an inappropriate location. 


homologous chromosomes Chromosomes 
that synapse (pair) during meiosis. Chromo- 
somes with the same genes in the same order. 
Also known as homologous pair, homologs. 


homologous genes Genes descended from 
a common ancestral gene. Also known as 
homologs. 


homologous nucleotides Nucleotides de- 
scended from a common ancestral nucleotide. 


homologous recombination Exchange of 
genetic information between homologous 
DNA molecules. 


homologs Homologous chromosomes that 
have the same genes and structure and pair 
with one another during meiosis. 


homology Evolutionarily related, having 
descended from a common ancestor. 


homoplasmic cell or organism A cell or 
organism in which all copies (alleles) of a 
cytoplasmic organelle gene are the same. Also 
known as homoplasmy. 


homoplasmy The presence of one allelic ver- 


sion of DNA in the organellar genomes of a cell. 


homozygous genotype A diploid genotype 
characterized by the presence of two identical 
alleles of a gene. 


host cell The cell containing a parasitic 
organism or infective particle. 


hotspot of mutation A location within a 
gene or genome at which mutations occur 
much more often than average. 


housekeeping gene Genes that have essen- 
tial cellular or physiological functions. 


Hox gene Members of the homeobox gene 
clusters found throughout metazoans; the 
genes often pattern the anterior-posterior axis 
and are homeotic genes. 


hybrid dysgenesis In Drosophila, the failure 
of F; progeny of P-cytotype males crossed 
with M-cytotype females to develop due to the 
presence of P elements. 


hybrid vigor The greater growth, survival, 
and fertility of hybrids produced by crossing 
highly inbred lines. 


hybridization (of molecular probe) In an 
electrophoresis gel or in gel blotting, the bind- 
ing of a single-stranded nucleic acid probe to a 
single-stranded target nucleic acid by comple- 
mentary base pairing. 


hydrogen bond Weak electrostatic attraction 
formed by the sharing of a positively charged 
hydrogen atom by negatively charged oxygen 
and nitrogen atoms. Hydrogen bonds form 
between complementary nucleotides to hold 
nucleic acid strands together. 


hypermorphic mutation A mutant whose 
phenotype is similar to, but greater than, the 
wild-type phenotype. 


hypomorphic mutation See leaky mutation. 


identical by descent (IBD) A homozygous 
genotype in an organism in which both copies 
of the allele in an individual can be traced back 
to acommon ancestor. 


illegitimate recombination Exchange of 
genetic information between non-homologous 
DNA molecules. 


immediate early genes Genes expressing 
immediately upon infection of a host bacterial 
cell by bacteriophage \ (lambda). See also 
delayed early genes and late genes. 


imprinting control region (ICR) Master 
regulatory cis-acting DNA sequences to which 
trans-acting factors bind to regulate genomic 
imprinting. 

inbreeding Mating between relatives. Also 
known as consanguineous mating. 


inbreeding depression A reduction in vigor, 
survival, or reproductive fitness of offspring 
due to inbreeding. 


incomplete dominance The observation 
that the phenotype occurring in heterozygous 
organisms is intermediate between the phe- 
notypes of homozygous organisms, but more 
similar to one homozygous phenotype than to 
the other. Also known as partial dominance. 


incomplete genetic linkage The occurrence 
of crossing over between linked genes. 


incomplete penetrance The occurrence of 
individual organisms that have a particular 
genotype or allele but not the corresponding 
phenotype. 


indels A shorthand term for small insertions 
and small deletions that are generally too small 
to cause obvious phenotypic abnormalities. 


induced mutation Mutations generated by 
exposure to physical, chemical, or biological 
mutagens. 


inducer An accessory molecule that binds 
to a protein that leads to activation of gene 
expression. The inducer can bind to a repres- 
sor protein and prevent its function or bind 
to an activator protein and stimulate 

its function. 


inducer-repressor complex A molecular 
complex consisting of a repressor protein and 
a bound inducer molecule. 


inducible operon Operon that is not ex- 
pressed under one set of environmental con- 
ditions, but whose transcription is activated 
under an alternative environmental condition 
(i.e., the Jac operon). 


induction Process by which one cell or tissue 
promotes a particular developmental fate in 
neighboring cells or tissues. 


inductive signal A molecule that acts non- 
cell autonomously to influence cell fate; in 

C. elegans vulval development, the lin-3 pro- 
tein secreted from the anchor cells acts as an 
inductive signal to influence the fate of vulval 
precursor cells. 


informational gene Class of genes that 
encode protein products that perform infor- 
mational processes in the cell such as DNA 
replication, packaging of chromosomes, 
transcription, and translation. 


ingroup A species within a clade used to 
compare to other members of the clade. 


inhibition Process by which one cell or tissue 
prevents a particular developmental fate in 
neighboring cells or tissues. 


inhibitor An accessory molecule that con- 
verts activator proteins to an inactive con- 
formation by binding to an allosteric binding 
domain of the activator protein. 


initial committed complex In eukaryotic 
transcription, a partially completed multipro- 
tein complex that is preparing to bind RNA 
polymerase II. 


initiation complex In eukaryotic translation, 
the complex formed by the small ribosomal 
subunit, mRNA, and charged tRNA-carrying 
methionine. 


initiation factor (IF) A group of proteins, 
associated with ribosomes, that contribute to 
ribosome assembly and translation initiation. 


initiator tRNA The first charged tRNA asso- 
ciated with the ribosome. 


inosine (I) A modified nucleotide found occa- 
sionally in anticodons that can base-pair with 
uracil, cytosine, or adenine. 


insertion sequence (IS) A bacterial DNA 
sequence that is the target of insertion of a 
transposable genetic element or is the site of 
integration of a plasmid such as an F plasmid. 


insertional inactivation A process of muta- 
tion in which the insertion of DNA into a gene 
renders it nonfunctional. 


in situ hybridization A laboratory method 
for hybridizing a molecular probe to a 
DNA sequence or a gene on an intact 
chromosome. 


insulator sequence Cis-acting sequences 
that act to prevent cross-talk between regula- 
tory elements of an adjacent gene and are 
located between enhancers and promoters of 
genes that are to be insulated from the effects 
of the enhancer. 


interactive variance (V,) In polygenic and 
multifactorial inheritance, the proportion of 
total phenotypic variance that is due to the 
interactions of genetic and environmental 
factors. 


interactome The sum of all of the protein— 
protein interactions in an organism. 


interchromosomal domain Open spaces be- 
tween chromosome domains in the interphase 
nucleus. 


interference (/) Measured on a zero to 1.0 
scale, the measurement of the independence 
of crossovers. Expressed as 1.0 minus the coef- 
ficient of coincidence (c). 


intergenic region DNA sequence between 
coding genes. 


internal control region (ICR) Promoter con- 
sensus sequences of certain rRNA and tRNA 
genes that are downstream of the start of tran- 
scription (i.e., sequences that are internal to 
the transcriptional region of the gene). 


internal promoter element Promoter con- 
sensus sequences of snRNA and tRNA genes 
that are downstream of the start of transcrip- 
tion (i.e., sequences that are internal to the 
transcriptional region of the gene). 


interphase The multiphase period of the cell 
cycle between cell divisions. See also G; phase, 
S phase, and Gy phase. 


interrupted mating A technique used to 
map bacterial genes that stops conjugation 
at timed intervals to determine which genes 
have transferred from the donor cell to the 
recipient cell. 


interspecific comparison Any comparison 
between different species. Compare with in- 
traspecific comparison. 


interstitial deletion The loss of a portion of a 
chromosome from within one arm. 


intragenic recombination Crossing over 
within a gene. 


intragenic reversion A reversion produced 
by a second site mutation within a single gene. 


intraspecific comparison Any comparison 
between individuals of the same species. 
Compare with interspecific comparison. 


intrinsic termination In bacterial transcrip- 
tion, the DNA sequence-dependent mecha- 
nism for transcription termination. Inverted 
repeat DNA sequences induce formation of 3’ 
mRNA stem loop (hairpin) structures that are 
followed by multiple uracils (transcribed from 
adenines). 


introgression line Lines of experimental 
organisms in which genome segments from 
two or more other lines are present due to 
repeated back crosses between hybrids and 
organisms of one parental line. 


intron Intervening sequences between the 
exons of many eukaryotic genes. Present in 
DNA and pre-mRNA, but spliced out during 
pre-mRNA processing. 


intron self-splicing The capacity of certain 
RNA transcripts to undergo self-generated 
splicing that does not require splicing enzymes 
of the splicosome complex. 


intron splicing The spliceosome complex- 
driven process that removes introns from eu- 
karyotic pre-mRNA and ligates exons to form 
mature mRNA. 


inversion heterozygote Organisms whose 
homologous chromosomes have different 
structural organization. Most commonly, one 
has normal structure whereas the homolog 
carries an inversion. 


inversion loop At homologous chromo- 
some synapsis in an inversion heterozygote, 
the structure that forms by the looping of one 
chromosome to align homologous regions. 


inverted repeat (IR) sequence Identical or 
nearly identical DNA sequences located on the 
same molecule but with opposite orientations. 


IS (insertion sequence) element Mobile 
DNA elements in bacteria that cause muta- 
tions by inactivating the expression of genes 
into which they insert. 


island model In evolutionary genetics, a 
model of species evolution in which new spe- 
cies are reproductively isolated from an ances- 
tral population. 


isoaccepting tRNA The group of tRNAs that 
carry the same amino acid but recognize syn- 
onymous codons. 


ISWI complex Imitation switch complex that 
functions primarily to control the placement 
of nucleosomes into an arrangement that 
causes a region to be transcriptionally silent. 


joint probability The likelihood of an out- 
come requiring the occurrence of two or more 
simultaneous or sequential events. 


karyokinesis Part of telophase, the process of 
nuclear division between daughter cells. 


karyotype A digital or analog photograph of 
chromosomes arranged by conventional chro- 
mosome numbering. 


kilobase (kb) A length of nucleic acid con- 
taining 1000 nucleotides. 


kinetochore The site of attachment of mul- 
tiple proteins that connects a spindle fiber 
microtubule to the centromeric region of a 
chromosome. Forms during M phase of cell 
division. 

knockout library Collections of mutants in 
which most or all genes of a particular organ- 
ism have been mutated by inactivating (or 
“knocking out”) their expression. 


Kozak sequence A specific consensus se- 
quence of eukaryotic mRNA that contains the 
authentic start codon (Auc) sequence. 


lac phenotype Bacteria that are not able to 
grow ona medium containing lactose as the 
only sugar. Compare with /ac* phenotype. 


lac’ phenotype Bacteria that are able to grow 
on a medium containing lactose as the only 
sugar. Compare with lac” phenotype. 


lacA gene A gene of the bacterial /ac operon; 
encodes lac transacetylase. Compare with lacY 
gene and lacZ gene. 


lactose (lac) operon An inducible operon 
consisting of genes (JacA, lacY, lacZ) encoding 
enzymes allowing the use of lactose as a carbon 
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source. The operon is repressed by the lac 
repressor regulatory protein that binds to the 
lac operator sequence and is activated by the 
CAP-cAMP complex that binds to sequences 
of the CAP binding site. 


lacY gene A gene of the bacterial /ac operon; 
encodes lac permease, which facilitates import 
of lactose into the cell. Compare with lacZ 
gene and lacA gene. 


lacZ gene A gene of the bacterial /ac operon; 
encodes $-galactosidase, which breaks down 
lactose into glucose and galactose. Compare 
with lacY gene and lacA gene. 


lagging strand In DNA replication, the 
discontinuously synthesized strand whose 
Okazaki fragments are ligated to complete 
new strand synthesis. Compare with leading 
strand. 


large ribosomal subunit The larger of two 
subunits of the ribosome. 


lariat intron structure During intron splic- 
ing, the structure formed by covalent bonding 
of the 5’ guanine of an intron to the branch 
point adenine of the intron. 


late genes Bacteriophage genes expressed 
late in the lytic cycle. Encode protein products 
required for packaging of phage particles and 
lysis of the host cell. Late promoters and late 
operators are the regulatory sequences re- 
sponsible for late gene activation. 


lateral gene transfer (LGT) Transfer of ge- 
netic material between organisms belonging to 
the same or to different taxonomic groups. 
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lateral inhibition Process by which one cell 
or tissue prevents neighboring cells or tissues 
from acquiring a developmental fate similar 
its own. 


law of independent assortment (Mendel’s 
second law) The random distribution of al- 
leles of unlinked genes into gametes. 


law of segregation (Mendel’s first law) The 
separation of alleles of a gene during gamete 
formation. 


leader region (trpL) Transcribed region up- 
stream of the major enzyme encoding genes of 
repressible amino acid biosynthesis operons 
(e.g., trpL). Region encodes a small peptide 
whose rate of translation reflects the concen- 
tration of the amino acid (e.g., tryptophan) in 
the cells and consequently regulates transcrip- 
tion of the operon. 


leader sequence See signal sequence. 


leading strand In DNA replication, the con- 
tinuously synthesized strand. Compare with 
lagging strand. 


leaky mutation A mutant whose phenotype 
is similar to, but less than, the wild-type phe- 
notype. Also known as hypomorphic mutation. 


lethal allele See lethal mutation. 


lethal mutation An allele that results in the 
premature death of the organisms that carry 
it. Lethality most often affects homozygous 
organisms. Also known as lethal allele. 
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linkage disequilibrium The nonrandom 
distribution into gametes of alleles of linked 
genes. 


linkage equilibrium The random distribu- 
tion into gametes of alleles of linked genes 
achieved by crossing over between the genes. 
linkage group A group of genes displaying 
genetic linkage. 

linker A short, chemically synthesized 
oligonucleotide that can be ligated to DNA 
molecules. 


linker DNA DNA between nucleosomes in 
the 10-nm fiber structure of chromatin. 


locus control region (LCR) Specialized en- 
hancer element that regulates the transcrip- 
tion of multiple genes, often complexes of 
closely related genes. 


lod score (log of the odds ratio) Based on 
analysis of transmission in pedigrees, the sta- 
tistic used to calculate the likelihood of genetic 
linkage between genes. 


long noncoding RNA (IncRNA) A transcribed 
RNA molecule that does not possess an ex- 
tended open reading frame and that does not 
represent a tRNA, RRNA, or miRNA. 


long terminal repeats (LTRs) Arrays of 
scores to hundreds of nucleotides that bracket 
the ends of retroviruses integrated into host 
chromosomes. 


loss-of-function mutation A mutant that 
prevents the production of the wild-type pro- 
tein or renders it inactive. Most commonly a 
recessive mutation. 


lysis See lytic cycle. 


lysogenic cycle The life cycle of a bacterium 
infected by a temperate bacteriophage that 
integrates into the host chromosome and rep- 
licates along with it. 


lysogeny See lysogenic cycle. 
lytic cycle The life cycle of a bacterium 
infected by a bacteriophage that replicates 


within the host cell and lyses the host to re- 
lease progeny bacteriophage. 


M phase The cell division phase of the cell 
cycle. Follows interphase. 


macroevolution Evolutionary processes op- 
erating at the species level and higher. 


MADS-box A conserved sequence of DNA of 
168-180 nucleotides encoding a 56-60 amino 
acid DNA-binding domain in a family of tran- 
scription factors found throughout eukaryotes; 
in flowering plants, some MADS-box genes 
are homeotic genes. 


major gene A gene that has a substantial ef- 
fect on phenotypic variation. 


major groove The larger of two grooves 
formed in the DNA sugar-phosphate backbone 
by the helical twist of the double helix and ex- 
posing certain base pairs. 


mapping function Corrective calculations 
used to more accurately estimate recombination 


frequencies between linked genes. Mapping 
functions differ among certain species. 


map unit (m.u.), centiMorgan (cM) A theo- 
retical unit of distance between linked genes 
on a chromosome. 


maternal effect gene Genes that act in the 
mother to impart gene products (RNA or 
protein) into the egg and subsequently the em- 
bryo. For maternal effect genes, the embryonic 
phenotype is determined by the genotype of 
the mother rather than that of the embryo. 


matrix attachment region (MAR) Portions 
of the chromosome scaffold to which loops of 
chromatin are attached. 


mature mRNA The fully processed product 
of eukaryotic transcription that moves to the 
cytoplasm for translation. 


mean (u) The average value of a group of 
values. 


median In a sample distribution, the middle 
most values. Also known as median value. 


median value See median. 


mediator An enhanceosome complex that 
forms a bridge between activator proteins 
bound to enhancer elements and the basal 
transcriptional machinery bound to the 
promoter. 


megabase (Mb) Equal to 1,000,000 nucleotide 
bases. Refers to DNA or to RNA molecules or 
fragments. 


meiosis The process of cell division occurring 
in germ-line cells. Produces four haploid gam- 
etes or spores through two successive nuclear 
divisions in diploid species. 

meiosis | First nuclear division characterized 
by homologous chromosomes separating. 
Compare with meiosis II. 


meiosis Il Second nuclear division character- 
ized by sister chromatids separating. Compare 
with meiosis I. 


Mendelian genetics Referring to genetic ap- 
plications and analyses using the law of segre- 
gation and the law of independent assortment 
originally described through experiments and 
analysis by Gregor Mendel. 


meristem Organized groups of pluripotent 
cells at the growing tips of plants that both 
generate organs and self-maintain to ensure 
that a pool of stem cells is always present. 


messenger RNA (mRNA) A form of RNA 
transcribed from a gene and subsequently 
translated to produce a polypeptide or 
protein. 


metabolomics The study of proteins, pro- 
cesses, and interactions involved in the me- 
tabolism of organisms. 


metacentric chromosome A chromosome 
with a centrally located centromere that pro- 
duces long and short arms of approximately 
the same length. 


metagenome Sequence derived from whole- 
genome shotgun sequencing of DNA from 


entire natural communities consisting of a 
range of organisms. 


metaphase The stage of M phase during 
which chromosomes align in the middle of 
the cell. 


metaphase plate The cell midline along 
which chromosomes align during metaphase. 


microdeletion A small chromosome deletion 
detectable only by using molecular methods of 
analysis. 


microduplication A small chromosome du- 
plication detectable only by using molecular 
methods of analysis. 


microevolution Evolutionary changes at the 
population level. 


microRNA (miRNA) Small (21-24 nuts) regu- 
latory RNAs produced by Dicer and acting in 
a RISC complex to either repress translational 
or cleave target mRNA molecules. Compare 
with RNA interference (RNAi). 


microsynteny Conservation of the order of 
a small number of genes in the same order in 
related species. 


migration A process of evolution referring 
to the movement of organisms and genes be- 
tween populations. Also known as gene flow. 


minimal initiation complex In eukaryotic 
transcription, a partially completed multipro- 
tein complex that is preparing to bind RNA 
polymerase II. 


minor groove The smaller of two grooves 
formed in the sugar-phosphate backbone by 
the helical twist of the double helix, exposing 
certain base pairs. 


mismatch repair The DNA repair process 
that repairs noncomplementary base pairs 
that occur through errant DNA replication or 
through nucleotiude base modification. The 
process restores normal complementary base 
pairing. 

missense mutation A DNA base-pair substi- 
tution that leads to production of a polypep- 
tide in which one amino acid substitutes for 
another. 


mitochondrion An organelle, bounded by 

a double membrane, encoding polypeptides 
that interact with nuclear gene polypeptides in 
oxidative phosphorylation to generate ATP. In 
many species, mitochondria also participate 
in other metabolic processes and biochemical 
reactions, including ion homeostasis and bio- 
synthetic pathways. 


mitosis The process of cell division in somatic 
cells that produces genetically identical daugh- 
ter cells through a single nuclear division. 


mitosome Double-membrane-bound orga- 
nelles that are evolutionarily derived from 
mitochondria but have lost all of the ancestral 
genome; proteins requiring an anaerobic 
environment to function are imported into 
them. 


mitotic crossover Crossing over between ho- 
mologous chromosomes during mitosis. 


modal value See mode. 


mode Ina sample distribution, the most 
commonly occurring value. Also known as 
modal value. 


modern synthesis of evolution Referring to 
the broad-based effort beginning in the mid- 
dle of the 20th century to unite Mendelian 
genetics with Darwin’s theory of evolution by 
natural selection. 


modifier gene A gene that modifies the effect 
of a major gene. 


modifier screen A genetic screen designed to 
identify mutations in genes that modify, either 
enhance or suppress, the phenotypic effects of 
mutations in another gene. 


molecular cloning The process whereby a 
single DNA molecule is selectively cloned 
from a mixture of DNA molecules and then 
amplified to produce a large number of identi- 
cal copies. 


molecular genetics The subfield of genetics 
that studies hereditary transmission, variation, 
mutation, and evolution through the analysis 
of nucleic acids and proteins. 


molecular probe (probe) A single-stranded 
nucleic acid or antibody protein labeled with 

a detectable marker that attaches to a specific 
target molecule, allowing target molecule de- 
tection in subsequent analysis. Single-stranded 
nucleic acid probes detect target nucleic acids, 
and antibody probes bind specific target 
proteins. 


monohybrid cross A genetic cross between 
organisms that are heterozygous for one gene. 
monophyletic group A group of organisms 
with a single common ancestor. 

monosomy The presence of a single chromo- 


some instead of a homologous pair, resulting 
in a chromosome number that is 27 — 1. 


morphogen Substance whose presence in 
different concentrations directs different de- 
velopmental fates. 


multifactorial inheritance The inheritance 
of traits whose phenotypic variation is the re- 
sult of polygenic inheritance and environmen- 
tal influences. See also multifactorial trait. 


multifactorial trait Traits whose phenotypic 
variation is the result of polygenic inheritance 
and environmental influences. See also multi- 
factorial inheritance. 


multiple cloning site (MCS) A vector DNA 
sequence containing several unique restriction 
enzyme target sequences facilitating cloning of 
inserted DNA fragments. 


multiple gene hypothesis The hypothesis 
that alleles of multiple genes contribute to the 
production of certain traits. 


multiplication rule See product rule. 


multipoint linkage analysis A statistical 
method for testing and mapping alternative 
orders of multiple genes linked on a chromo- 
some. Related to lod score analysis. 


mutagen A chemical, physical, or biological 
agent capable of damaging DNA and creating 
a mutation. 


mutagenesis A procedure whereby a popula- 
tion of organisms is mutagenized and their 
progeny are propagated and examined for 
mutant specific phenotypes. See also genetic 
screen. 


mutation An inherited change in DNA. 


mutation rate The rate at which mutations 
occur per gene per unit of time. Most often 
expressed per gene per generation. 


mutation-selection balance An arithmetic 
expression used to determine the equilibrium 
frequencies of alleles in populations as a result 
of allele elimination by natural selection and 
new allele creation by mutation. 


narrow sense heritability (h?) The propor- 
tion of total phenotypic variance that is con- 
tributed by additive genetic variance. 


natural selection The evolutionary process 
operating through differences in survival, fe- 
cundity, and relative fitness of organisms with 
different genotypes and phenotypes. 


negative control (of transcription) Condition 
where binding of a repressor protein to a regu- 
latory DNA sequence prevents transcription of 
a gene or a cluster of genes. 


negative interference Occurring when the 
coefficient of coincidence (c) is greater than 1.0, 
the observation of more double crossovers 
than expected between a pair of genes. 


negative supercoiling Twisting of the DNA 
duplex in the direction opposite to the turns of 
the double helix. 


neofunctionalization The process, following 
gene duplication, whereby a mutation in one 
of the duplicates provides a function not per- 
formed by the original gene. 


neomorphic mutation A mutant expressing 
a new or novel function not seen in the wild 
type. 

next-generation sequencing High through- 
out massively parallel DNA sequencing by 
synthesis. 


N-formylmethionine (fMet; tRNAfMet) A 
modified methionine amino acid usually used 
as the amino acid that initiates bacterial trans- 
lation. Carried by a specialized tRNA. 


node An evolutionary branch point in a 
phylogenetic tree. 


noncomposite transposon Bacterial trans- 
posable genetic elements that lack insertion 
sequences. 


nondisjunction The failure of homolog or 
sister chromatid separation during cell divi- 
sion. Results in nuclei with the wrong number 
of chromosomes. 


nonenveloped virus Viral particles 
consisting only of a protein capsid and other 
proteinaceous elements. 
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nonhistone protein Numerous nuclear 
proteins that are not histones associated with 
chromosomes. 


nonhomologous end joining (NHEJ) An 
error-prone mechanism of double-stranded 
DNA break repair in eukaryotic genomes in 
which damaged nucleotides are removed and 
blunt ends of strands are joined. 


noninducible Condition in which transcrip- 
tion of bacterial genes or operons cannot be 
activated. 


nonparental ditype (NPD) In an ascus, the 
occurrence of four haploid spores that are each 
recombinant. 


nonpenetrant An organism with a genotype 
corresponding to a mutant phenotype that 
instead displays the wild-type phenotype. 


nonrecombinant vector Produced in a clon- 
ing experiment when the intended vector does 
not pick up a DNA insert. 


nonreplicative transposon A transposable 
genetic element that transposes by excision 
from the original genome location, followed by 
insertion into a new location. 


nonrevertible mutants Mutations caused by 
partial deletion of DNA nucleotides that can- 
not be reverted to wild type. 


nonsense mutation A type of point mutation 
producing a stop codon in mRNA. 


nonsister chromatid A chromatid belong- 
ing to a homologous chromosome. Nonsister 
chromatids of homologs are involved in 
crossing over. 


nontemplate strand See coding strand. 


normal distribution The continuous distri- 
bution of outcomes predicted by chance. Also 
known as Gaussian distribution. 


northern blotting A method for transferring 
mRNA from an electrophoresis gel to a per- 
manent membrane or filter. 


north-south (NS) resolution One possible 
pattern for resolving a Holliday junction to 

separate homologous chromosomes before 

meiotic anaphase. 


nuclear mitochondrial sequence 
(NUMTS) Mitochondrial DNA sequences 
found in the nucleus as a result of recent 
transfer from the mitochondrial genome to 
the nuclear genome. 


nuclear plastid sequence (NUPTS) Plastid 
DNA sequences found in the nucleus as a re- 
sult of recent transfer from the plastid genome 
to the nuclear genome. 


nucleoid The region of bacterial and archaeal 
cells (or mitochondria or chloroplasts) where 
the main chromosome resides. 


nucleolus (plural: nucleoli) Nuclear organelle 
containing rRNA-encoding genes. 


nucleomorph In a secondary endosym- 
biosis, the nuclear genome of the secondary 
endosymbiont. 
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nucleosome An octameric protein complex 
composed of two polypeptides each of his- 
tones H2A, H2B, H3, and H4, around which 
DNA wraps in chromatin. 


nucleosome core particle The octameric 
histone protein complex (two molecules each 
of H2A, H2B, H3 and H4) around which core 
DNA can wrap. 


nucleosome-depleted region (NDR) A 100- 
to 150-bp region containing few nucleosomes, 
which lies immediately upstream of the start of 
transcription. 


nucleotide base analog A compound witha 
size and shape that mimics a natural nucleo- 
tide base. 


nucleotide excision repair A mechanism 

of DNA damage repair in which a segment of 
one strand containing damaged nucleotides is 
excised and replaced. 


null mutation A mutant that produces no 
functional product. Most commonly a reces- 
sive allele. Also known as amorphic mutation. 


Okazaki fragment A short segment of newly 
synthesized DNA that is part of a lagging 
strand and is ligated to other Okazaki frag- 
ments to complete lagging strand synthesis. 


oncogene A mutated form of a proto- 
oncogene; frequently associated with cancer 
development. 


one gene-one enzyme hypothesis Proposed 
by George Beadle and Edward Tatum in 1941, 
the hypothesis proposing that each gene en- 
codes a specific protein product and controls a 
distinct function. 


open chromatin Chromatin in which the as- 
sociation of DNA with nucleosomes is relaxed 
in regions containing regulatory sequences, 
allowing access by regulatory proteins and giv- 
ing genes in open chromatin the potential to 
be transcriptionally active. 


open promoter Promoters that reside in 
open chromatin, resulting in constitutive tran- 
scription. See also open promoter complex. 


open promoter complex At transcription 
initiation, the stage at which RNA polymerase 
is bound anda short region of DNA opens to 
allow transcription from the template strand. 
See also open promoter. 


operational gene Class of genes that encode 
proteins involved in cellular metabolic pro- 
cesses (e.g., amino acid biosynthesis, biosyn- 
thesis of cofactors, fatty acid and phospholipid 
biosynthesis, intermediary metabolism, energy 
metabolism, nucleotide biosynthesis). 


operator Regulatory DNA sequences to 
which repressor or activator proteins bind. 
Term used in bacterial systems. 


operon A set of adjacent genes that are tran- 
scribed in a polycistronic mRNA and are thus 
coordinately regulated; an operon is generally 
considered to include associated regulatory 
sequences (e.g., promoter, operator, etc.). Pri- 
marily found in bacteria and archaea. 


ordered ascus The linear sequence in an 
ascus of haploid spores whose arrangement 
allows determination of the chromatids par- 
ticipating in crossing over. 

organelle inheritance The transmission 
of genes on mitochondrial and chloroplast 
chromosomes. 


organizer Groups of cells that possess the 
ability to influence the fates of cells in the sur- 
rounding tissues via non-autonomous signals. 


origin of migration The starting point 
of nucleic acid or protein migration in gel 
electrophoresis. 


origin of replication The specific sequence at 
which DNA replication begins. 


origin of transfer (oriT) The site within the 
fertility (F) factor sequence where transfer to 
the recipient cell is initiated. 


orthologous genes Genes in different spe- 
cies whose origin lies in a speciation event and 
that can be traced to a single gene in a com- 
mon ancestor of the two species. Also known 
as orthologs. 


orthologs See orthologous genes. 


outgroup A species related to members of a 
clade but outside the clade; used to root the 
clade. 


Pelement A specific type of transposable 
genetic element prevalent in the Drosophila 
genome. 


P value (probability value) In the chi square 
test, the likelihood that a repeat experiment 
will produce a result as deviant or more devi- 
ant than expected in comparison to the experi- 
mental result being tested. 


paired-end sequencing Sequence gener- 
ated from both ends of a DNA clone; provides 
evidence of physical linkage of the two paired 
sequences. 


pair-rule gene In Drosophila, genes that 
delimit parasegments along the anterior- 
posterior axis; examples include even-skipped 
and odd-skipped. 


paracentric inversion A chromosome inver- 
sion in which the inverted segment does not 
include the region of the centromere. 


paralogous genes Genes whose origin lies 
in a gene duplication event within an extant or 
ancestral species. Also known as paralogs. 


paralogs See paralogous genes. 
paraphyletic group A group of organisms 
that includes some but not all the members 
descended from a common ancestor. 


parasegment In Drosophila, the posterior 
part of one segment and the anterior part of 
its neighbor. The stripes of gene expression 
of pair-rule genes correspond to paraseg- 
ments, straddling the boundaries between 
segments. 


parental (nonrecombinant) chromosome 
Chromosomes in gametes produced when 
crossing over does not take place between 


linked genes. Alleles marking each gene are re- 
tained in their initial (parental) configurations. 


parental ditype (PD) In an ascus, the oc- 
currence of four haploid spores that are each 
nonrecombinant. 


parental generation (P generation) The 
parents of F; progeny. In controlled genetic 
crosses, the parents are pure-breeding. 


parental strand The DNA strand acting as 
a template to direct the synthesis of a new 
(“daughter”) strand of DNA. 


partial chromosome deletion The loss of a 
segment of a chromosome. 


partial deletion The loss of a segment of a 
chromosome. Results in partial monosomy for 
the affected chromosome segment. 


partial deletion heterozygote An organism 
with one wild-type chromosome and a ho- 
molog that is missing a segment. 


partial diploid An exconjugant bacterium 
that acquires a second copy of one or more 
genes by conjugation with an F’ donor cell. 


partial dominance See incomplete dominance. 


partial duplication The duplication of a seg- 
ment of a chromosome. 


partial duplication heterozygote An organ- 
ism with one wild-type chromosome and a 
homologous chromosome with a duplicated 
segment. 


particle gun bombardment Technique of 
using high pressure to fire microscopic par- 
ticles coated with DNA into plant cells. The 
particles are propelled with enough force to 
penetrate the cell wall and plasma membrane. 


particulate inheritance Mendel’s theory that 
genetic information is transmitted from one 
generation to the next as discrete units or ele- 
ments of heredity. 


Pascal’s triangle A diagram listing the coef- 
ficients of a given binomial expansion in which 
the binomial expression is expanded n number 
of times. 


pathogenicity island Regions of laterally 
transferred DNA that contain genes with 
pathogenic function. 


pedigree A family tree composed of standard 
symbols that depicts relationships in succes- 
sive generations and often displays individual 
phenotypes. 


penetrant Expression of the phenotype cor- 
responding to a particular genotype. 


peptide bond A type of covalent bond that 
joins amino acids in polypeptide chains. 
Formed between the amino end of one amino 
acid and the carboxyl end of the adjoining 
amino acid. 


peptide fingerprint analysis A form of chro- 
matography in which polypeptide fragments 
are separated and distinctive patterns revealed. 
peptidyl site (P site) The site on the ribo- 
some where amino acids are joined by a 
peptide bond. 


pericentric inversion A chromosome inver- 
sion in which the inverted segment includes 
the region of the centromere. 


permissive condition Environmental condi- 
tion in which environmentally sensitive (e.g., 
temperature sensitive) mutants exhibit the 
wild-type phenotype or can survive. 


phenocopy A phenotype similar to a pheno- 
type caused by mutation but that is produced 
instead by an environmental condition. 


phenotype (1) The observable physical char- 
acteristics or traits of an organism. (2) The 
physical manifestation of a specific genotype. 


phenotypic ratio (3:1 ratio and 9:3:3:1 

ratio) A ratio or set of relative proportions be- 
tween organisms with different phenotypes— 
for example, the ratio of progeny produced by 
a monohybrid cross (3:1) or a dihybrid cross 
(9:3:3:1). 

phenotypic variance (Vp) The total variance 
observed for a trait. 


phosphodiester bond A type of covalent 
bond formed between two nucleotides in a 
nucleic acid strain. Formed between the 5’ 
phosphate group of one nucleotide and the 3’ 
OH of the adjacent nucleotide. 


photoproduct A characteristic DNA lesion 
produced by exposure to ultraviolet light. 


photoreactive repair A mechanism of DNA 

damage repair in bacteria that uses light in the 
visible part of the spectrum to provide the en- 
ergy to remove the damage done by ultraviolet 
irradiation. 


phylogenetic footprinting Technique 
whereby conserved sequences are identified 
by searching for similar sequences in species 
separated by large evolutionary distances. 


phylogenetic shadowing Technique 
whereby conserved sequences are identified by 
first eliminating sequences that are not con- 
served in closely related species. 


phylogenetic tree A diagram of evolution- 
ary relationships among organisms or genes 
based on morphological or molecular 
characteristics. 


phylogenomics Method for determining 
phylogenetic relationships of organisms using 
genomic DNA sequence information. See also 
evolutionary genomics. 


physical gap Sequence gap between scaf- 
folds for which there is no clone to supply the 
sequence. 


plasmid One of multiple types of extrachro- 
mosomal circular DNA molecules that may be 
found in bacterial cells. 


plastid Organelle, bounded by a double 
membrane, descended from the cyanobacte- 
rial endosymbiont; specialized types of plas- 
tids include chloroplasts and chromoplasts. 


pleiotropy A single gene mutation that af- 
fects multiple and seemingly unconnected 
properties of an organism. 


pluripotent State of a cell when it can give 
rise to many but not all cell types of an 
organism. 


point mutation A DNA lesion at a defined 
location. Usually either a base pair substitution 
or the insertion or deletion of one or a small 
number of base pairs. 


polar mutation Mutations affecting down- 
stream genes in an operon by reducing pro- 
duction or altering translation of polycistronic 
mRNA. 


polyacrylamide A synthetic compound 
mixed with buffer and used to form electro- 
phoresis gels. 


polyadenylation signal sequence A 
hexanucleotide sequence of mRNA, usually 
AAUAAA, that identifies the location of 3’ 
pre-mRNA cleavage and polyadenylation. 


polycistronic mRNA In bacteria, an mRNA 
containing the transcripts of two or more genes. 


polygenic inheritance A quantitative trait 
dependent on the contributions of multiple 
genes. Also known as polygenic trait. 


polygenic trait See polygenic inheritance. 


polymerase chain reaction (PCR) A labora- 
tory method for controlled replication of a 
specific target sequence of DNA in successive 
cycles. Using two short single-stranded prim- 
ers that bind to sequences on opposite sides of 
the target sequence, exponential replication of 
the target sequence occurs. 


polypeptide A chain of amino acids joined 
by peptide bonds. Formed at ribosomes during 
translation. 


polyploidy The presence of more than two 
complete sets of chromosomes in a genome. 
See also allopolyploidy and autopolyploidy. 


polyribosome In translation, the simultane- 
ous translational activity of multiple ribo- 
somes ona single mRNA. 


population A group of organisms that 
mate with one another to establish the next 
generation. 


population genetics The subfield of genetics 
that studies the genetic structure and evolu- 
tion of populations. 


positional cloning The process by which the 
DNA sequence of a gene identified only by 
mutant phenotype can be obtained by using 
genetic and physical maps. Also known as 
chromosome walking. 


positional information Process by which 
gene expression or other chemical cues estab- 
lish geographical addresses along the axes of a 
developing embryo or organ primordium. 


position effect variegation (PEV) The obser- 
vation in Drosophila of a specific type of muta- 
tion producing variegation of eye color due to 
the abnormal positioning of the w (white) gene 
for eye color. 


positive control (of transcription) Condition 
where binding of an activator protein to a 
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regulatory DNA sequence stimulates transcrip- 
tion of a gene or a cluster of genes. 


positive-negative selection The use of both 
negative and positive selectable markers to 
follow the fate of introduced DNA to select for 
homologous recombination events. 


positive supercoiling Superhelical twisting 
of DNA. 


posttranslational polypeptide processing 
In eukaryotes, modifications to polypeptides 
in the endoplasmic reticulum and Golgi appa- 
ratus after the completion of translation. 


postzygotic mechanism Mechanisms op- 
erating after mating to reduce or prevent 
the possibility of producing hybrids between 
populations or species. 


precursor microRNA (pre-miRNA) The stem 
loop product derived from processing of pre- 
microRNAs. The pre-microRNA stem loop 

is further processed by Dicer to produce the 
mature single-stranded microRNA from the 
double-stranded region of the stem loop. 


precursor mRNA (pre-mRNA) The initial 
transcript of a eukaryotic gene requiring 
mRNA processing prior to translation. 


preinitiation complex (PIC) In eukaryotic 
transcription, a large multiprotein complex 
containing several general transcription fac- 
tors and RNA polymerase II. 


prezygotic mechanism Mechanisms op- 
erating before mating to reduce or prevent 
the possibility of producing hybrids between 
populations or species. 


Pribnow box (-10 consensus sequence) A 
specific consensus sequence component of the 
bacterial promoter with a location centered 

at approximately —10 relative to the start of 
transcription. 


primary microRNA (pri-miRNA) The primary 
transcript, with single-stranded ends and a 
stem loop, from which pre-microRNAs are de- 
rived by processing. The single-stranded ends 
of the pri-microRNA are removed, by Drosha 
in animals and Dicer in plants, to produce the 
pre-microRNA. 


primase (DnaG) The specialized RNA 
polymerase that synthesizes the RNA primer 
during DNA replication. 


primer annealing In PCR, the binding by 
complementary base pairing of a short single- 
stranded primer by complementary base 
pairing. 

primer extension In PCR, the synthesis of 
DNA by DNA polymerase beginning at the 3’ 
end of a short single-stranded primer. 


primer walking Technique for sequencing 
long DNA molecules where new sequencing 
primers are synthesized based on successive 
DNA sequence reads. Compare with shotgun 
sequencing. 

product rule The probability of an event re- 
quiring the sequential or simultaneous occur- 
rence of two or more contributing events. The 
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probabilities of contributing events are multi- 
plied, and their product is the event in ques- 
tion. Also known as the multiplication rule. 


proliferating cell nuclear antigen (PCNA) In 
eukaryotic DNA replication, the functional 
equivalent of the bacterial sliding clamp that 
adheres DNA polymerase to the template 
strand and drives its progression. 


prometaphase In M phase of the cell cycle, 
sometimes identified as a stage between pro- 
phase and metaphase. 


promoter A regulatory sequence of DNA 
near the 5’ end of a gene that acts as the bind- 
ing location of RNA polymerase and directs 
RNA polymerase to the start of transcription. 


promoter mutation A mutation altering pro- 
moter sequence and function. 


promoter-specific element (PSE) A specific 
promoter consensus sequence located up- 
stream of small nuclear RNA genes. 


prophage The designation for bacteriophage 
that has integrated into the host bacterial 
chromosome. 


prophase The stage of M phase during which 
chromosome condensation occurs. 


protein A string of amino acids encoded dur- 
ing translation of mRNA and linked together 
by peptide bonds. See also polypeptide. 


proteome Set of the proteins in a cell, tissue, 
or organism. 


proteomics The study of all the proteins, col- 
lectively known as the proteome, within a cell, 
tissue, or organism. 


proto-oncogene A broad category of normal 
genes producing protein whose generalized 
functions promote cell proliferation. It is often 
mutated in carcinogenesis. 


prototroph The wild-type strain of a microor- 
ganism. Also, any organism that can synthesize 
its nutrients from inorganic material. 


pseudoautosomal region (PAR) Homolo- 
gous regions on the X and Y chromosomes 
that synapse and cross over. 


pseudodominance The phenotypic expres- 
sion of a recessive allele on one chromosome 
due to deletion of a portion of the homolo- 
gous chromosome containing the dominant 
allele. 


pseudogene Sequences recognizable as mu- 
tated gene sequences often derived from gene 
duplication or retrotransposition events. 


Punnett square Named in honor of early 
20th—century geneticist Reginald Punnett, a 
checkerboard-like diagram that predicts the 
genotypes and genotype frequencies of prog- 
eny from a genetic cross. 


pure-breeding strains A group of genetically 
identical homozygous organisms that, when 
self-fertilized or intercrossed, only produce 
offspring that have a phenotype identical to 
the parents. Also known as true-breeding 
strains. 


pyrimidine dimer The specific type of lesion 
formed on DNA due to exposure to ultraviolet 
irradiation. Also known as thymine dimer. 


QTL locus analysis A method for character- 
izing the effects of quantitative trait loci on 
variation. 


QTL mapping A method for locating quanti- 
tative trait loci ina genome. 


quantitative genetics The subfield of genet- 
ics that studies quantitative traits. 


quantitative trait A trait exhibiting polygenic 
inheritance and displaying continuous pheno- 
typic variation. 


quantitative trait locus (QTL) A gene con- 
tributing to the phenotypic variation of a 
quantitative trait. 


quaternary structure The state of protein 
function requiring the joining of two or more 
polypeptides to form a functional protein (e.g., 
hemoglobin protein). 


R-group The functional groups that give each 
amino acid their distinctive characteristics. 


R (resistance) plasmid A type of bacterial 
plasmid conferring resistance to one or more 
antibiotic compounds. 


radial loop-scaffold model A model of 
chromatin structure that predicts rosettes of 
looped chromatin on a chromosome scaffold. 


random X inactivation (Lyon hypothesis) 
Proposed by Mary Lyon in the mid-20th 
century, the process of randomly inactivating 
one copy of the X chromosome in each 
mammalian female nucleus early in zygotic 
development. 


reader A chromatin modifying enzyme that 
binds to chemical groups of chromatin (e.g., 
methyl or acetyl groups on the lysines of his- 
tone 3). 


reading frame The partitioning of sequential 
sets of mRNA trinucleotide segments (codons) 
that are used in translation to determine 
amino acid order of a polypeptide. 


realizator gene In Drosophila, the Hox target 
genes whose expression contributes to the 
characteristic morphology of each segment. 


RecBCD pathway A complex of three bac- 
terial proteins that cut DNA and facilitate 
homologous recombination; a foundation of 
eukaryotic homologous recombination. 


recessive epistasis (9:3:4 ratio) A charac- 
teristic ratio of phenotypes produced by the 
interaction of two genes that control a trait in 
which alleles of one gene mask or reduce the 
expression of alleles of a second gene. 


recessive phenotype The phenotype ob- 
served in an organism that is homozygous for 
the recessive allele. Compare with dominant 
phenotype. 


recipient cell (F` cell) A bacterial cell that 
does not contain fertility factor DNA sequence 
and can conjugate with a donor bacterium. 


reciprocal cross Paired crosses involving dis- 
tinct parental phenotypes in which the sexes are 
switched (i.e., if one cross is ¢ phenotype A X 2 
phenotype B, the reciprocal cross is 5 pheno- 
type BX 2 phenotype A). 


reciprocal translocation (balanced, 
unbalanced) Exchange of chromosome 
segments between non-homologous chromo- 
somes. If all genes are present, the transloca- 
tion is “balanced,” but if genes are missing, 
the translocation is “unbalanced.” 


recombinant (nonparental) chromo- 

some Chromosomes in gametes produced 
by crossing over between linked genes. Alleles 
marking each gene are rearranged on chroma- 
tids by crossing over. 


recombinant clone A combination of DNA 
molecules from different sources (e.g., vector 
and insert DNA) that are joined together using 
recombinant DNA technology. 


recombinant DNA technology The set of 
laboratory techniques developed for amplify- 
ing, maintaining, and manipulating specific 
DNA sequences in vitro as well as in vivo. 


recombination coldspot A chromosome 
region with a recombination rate that is lower 
than average for the number of nucleotide 
base pairs present. 


recombination hotspot A chromosome re- 
gion with a recombination rate that is higher 
than average for the number of nucleotide 
base pairs present. 


recombination frequency (r) The rate of 
occurrence of recombination between a pair 
of linked genes. Expressed as the number of 
recombinants divided by the total number of 
meioses. 


recombination nodule Protein aggregations 
along the synaptonemal complex that are 
thought to play a role in crossing over. 


reference genome sequence The DNA 
sequence of the individual or individuals used 
to construct the initial complete genome 
sequence. 


regulated transcription Condition in which 
gene expression is controlled at the transcrip- 
tional level in response to changing environ- 
mental conditions. 


regulatory mutation A mutation altering a 
regulated attribute of gene expression. 


relative fitness (w) In evolutionary genetics, 
the measurement of the reproductive fitnesses 
of organisms in a population relative to one 
another. The organism class with greatest fit- 
ness has a relative fitness of w = 1.0. 


release factor (RF) Molecules that bind 
mRNA stop codons and contribute to transla- 
tion termination. 


repetitive DNA Sequences of DNA that are 
found in more than one locus in a genome. 


repetitive DNA sequence DNA sequences 
that contain repeating nucleotides with unit 


lengths ranging from two base pairs (dinucleo- 
tides) to thousands of base pairs. 


replicate cross Repeated crosses involving par- 
ents with the same genotypes and phenotypes. 


replication bubble A region of active bidi- 
rectional DNA replication containing replica- 
tion forks on each end, an origin of replication 
in the middle, and leading and lagging strands 
in each half of the bubble. 


replication fork In DNA replication, the 
site of the replisome structure, and the site of 
synthesis of leading strand and lagging strand 
DNA. 


replicative segregation Random segregation 
of organelles during cell division. 


replicative transposition Transposition 
carried out by replicating a copy of a transpos- 
able element and inserting the copy in a new 
genome location. 


replisome The large molecular machine lo- 
cated at the replication fork that coordinates 
multiple reaction steps during DNA replication. 


reporter gene A gene whose expression is 
easy to assay phenotypically. Fusion of re- 
porter genes with heterologous sequences 
allows both transcriptional and translational 
expression patterns to be visualized. 


repressible operon Operon that is expressed 
under one set of environmental conditions, 
but whose transcription is repressed under an 
alternative environmental condition (i.e., the 
trp operon). 

repressor protein A transcription factor that 
binds to regulatory sequences associated with 
a gene and represses that gene’s expression. 


reproductive isolation The absence of in- 
terbreeding between populations or species; 
often involves geographic, physical, or behav- 
ioral mechanisms or conditions. 


response to selection (R) The amount of 
change in the phenotype of a trait between 
parental and offspring generations as a result 
of selection on the parents. 


restriction endonuclease One of a large 
number of DNA-digesting enzymes, usually 
of bacterial origin, that cut DNA at specific 
recognition sites called restriction sequences. 
Each enzyme has its own particular restriction 
sequence and generates double-stranded cleav- 
age of DNA at the restriction sequence. Also 
known as restriction enzyme. 


restriction enzyme A DNA endonuclease 
that targets a specific base pair sequence for 
enzymatic cleavage. 


restriction fragment length polymorphism 
(RFLP) A fragment of DNA generated by 
treatment with a restriction endonuclease. 


restriction map A map showing the numbers 
and relative positions of target sites for restric- 
tion enzymes of a DNA molecule. 


restriction sequence The specific base-pair 
sequence recognized by a particular restric- 
tion endonuclease. 


restriction-modification system System of a 
restriction enzyme with a specific recognition 
sequence and a modifying enzyme that adds 
methyl groups to bases of the recognition 
sequence. The system protects the bacteria’s 
own DNA from being digested by endogenous 
restriction enzymes but allows restriction of 
invading exogenous DNA. 


restrictive condition Environmental condi- 
tion in which environmentally sensitive (e.g., 
temperature sensitive) mutants exhibit the 
mutant phenotype. 


retrotransposon A transposable element 
that uses reverse transcriptase to transpose 
through an RNA intermediate. 


reverse genetics Genetic analysis that begins 
with a gene sequence, which is used to identify 
or introduce mutant alleles and subsequently 
to identify and evaluate the resulting mutant 
phenotype. It is the complementary approach 
to forward genetics. 


reverse mutation rate (v) The rate at which 
mutant alleles are reverted to wild-type alleles. 
Also known as reversion rate. 


reverse transcriptase Enzyme, derived from 
retroviruses or retrotransposons, that cata- 
lyzes the synthesis of a DNA strand (cDNA) 
from an RNA template. 


reverse transcription The process of DNA 
synthesis from an RNA template by the en- 
zyme reverse transcriptase. 


reverse translation The process of using 
the genetic code to deduce the possible DNA 
sequences encoding a specific amino acid 
sequence. 


reversion mutation A mutation that alters 
a mutant to wild-type sequence and function. 
Also known as reversion. 


reversion rate The rate at which reversion 
(reverse) mutations occur in an organism. 


revertible mutant A point mutation caused 
by base-pair substitution or deletion of one or 
a few base pairs that can be reverted to wild 
type. 

rho-dependent termination (rho pro- 

tein) The process of bacterial transcription 
termination involving rho protein. 


rho utilization site (rut site) The site of 
attachment of rho protein that aids in rho- 
protein-driven bacterial transcription 
termination. 


ribonucleic acid (RNA) A family of poly- 
nucleotides that are transcribed from DNA. 
RNAs are composed of nucleotides contain- 
ing the sugar ribose, one or more phosphate 
atoms, and one of four nitrogenous bases 
(A, G, C, and U). 


ribonucleotide Composed of ribose, one or 
more phosphate groups, and one of four ni- 
trogenous bases, the nucleotides that make up 
RNA. See also adenine (A), uracil (U), guanine 
(G), and cytosine (C). 

ribose The 5-carbon sugar molecule in 
ribonucleotides. 
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ribosomal RNA (rRNA) A group of RNA 
molecules that compose part of the structure 
of ribosomes. 


ribosome Ribonucleoprotein particles, com- 
posed of rRNAs and numerous proteins, at 
which translation takes place. 


ribozymes Catalytically active RNAs. 


RNA editing The process of post- 
transcriptional addition or removal 
of nucleotide of certain mRNAs. 


RNA interference (RNAi) A regulatory 
gene-silencing mechanism based on double- 
stranded RNA, which can target comple- 
mentary sequences for inactivation. The 
machinery can be harnessed to silence gene 
expression in a reverse genetic approach. 


RNA pol A shorthand term for RNA poly- 


merase. 


RNA polymerase The enzyme that catalyzes 
the synthesis of RNA. See also RNA pol, RNA 
pol I, RNA pol II, and RNA pol III. 


RNA polymerase I (RNA pol I) In eukaryotic 
transcription, the enzyme that transcribes cer- 
tain rRNA genes. 


RNA polymerase II (RNA pol Il) In eukaryotic 
transcription, the enzyme that transcribes 
protein-coding genes to produce mRNA. 


RNA polymerase III (RNA pol Ill) In eukaryotic 
transcription, the enzyme that transcribes 
tRNA genes. 


RNA polymerase core The five-polypeptide 
component of bacterial RNA polymerase that 
actively carries out transcription. 
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RNA primer In DNA replication, the short, 
single-stranded RNA segment synthesized 
by primase. The 3’ end of the RNA primer is 
used by DNA polymerase to begin synthesis 
of DNA. 


RNA-induced silencing complex 

(RISC) Complex containing Argonaute protein 
that binds small RNA molecules and targets 
complementary RNA molecules for degrada- 
tion or translational repression. 


RNA-induced transcription-silencing (RITS) 
complex RISC-like complex that mediates 
small RNA-induced transcriptional gene 
silencing. 

Robertsonian translocation The fusion of 
two non-homologous chromosomes, often 
with the deletion of a small amount of nones- 
sential genetic material. Also known as 
chromosome fusion. 


rolling circle replication A unidirectional 
mode of DNA replication used to replicate 
circular plasmid molecules in which the repli- 
cating circular molecule appears to reel off its 
nontemplate DNA strand, using the other as 
the template for replication. 


S phase (synthesis phase) The middle phase 
of interphase, during which DNA replication 
takes place. 


Sanger method See dideoxy DNA sequencing. 
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saturation mutagenesis Mutagenesis aimed 
at identifying multiple mutant alleles for all loci 
in the genome of an experimental organism. 


scaffold A set of contigs that are physically 
linked. 


scanning In eukaryotic translation, the pro- 
cess used by the small ribosomal subunit to 
locate the authentic start codon. 


secondary endosymbiosis (tertiary 
endosymbiosis) Endosymbiotic event where 
one eukaryotic, usually photosynthetic, is an 
endosymbiont within another eukaryote result- 
ing in an organism with genomes derived from 
at least two nuclear genomes and multiple 
organellar genomes. 


secondary structure A form of protein fold- 
ing in which hydrogen bonds between amino 
acids of a polypeptide stabilize a-helical twists 
or B-sheets. 


second-division segregation Patterns of 
haploid spores in an ascus that indicate the 
alleles were separated at the second meiotic 
division as a result of crossing over between a 
gene and the centromere. 


second-site reversion A specific type of 
reversion taking place at a location separate 
from the site altered to generate the original 
mutation. 


segment Division of the body along the 
anterior-posterior axis into a series of morpho- 
logical similar units. 


segment polarity genes In Drosophila, genes 
that delimit the anterior and posterior regions 
of individual parasegments along the anterior- 
posterior axis; examples include wingless, 
engrailed, hedgehog, and gooseberry. 


selected marker screen An experimental 
method used to detect microorganisms with a 
specific genotype. 

selection coefficient (s) The value of the re- 
duction in reproductive fitness for an organism 
(Le., w=1.0— s). 

selection differential (S) The difference 
between the population mean value for a 
phenotype and the phenotype value of popula- 
tion members selected as parents for the next 
generation. 


selective growth medium The growth me- 
dium used in a selective marker screen. 


semiconservative replication The estab- 
lished method of DNA replication in which 
each strand of a parental duplex acts as a tem- 
plate for daughter strand synthesis and each 
daughter duplex is composed of one parental 
strand and a complementary daughter strand. 


semisterility Reduced fertility, commonly the 
result of the occurrence of adjacent segrega- 
tion during meiosis in balanced translocation 
heterozygotes. 


sequence gap Gap between two contigs for 
which a clone is available for further sequenc- 
ing that could close the gap. 


sex chromosome Homologous chromosomes 
that differ between the sexes. Designated X 


and Y in species in which females are XX and 
males XY. Designated Z and W in species in 
which females are ZW and males are ZZ. 


sex determination The genetically controlled 
processes that determine the sex of offspring. 


sex-influenced expression The differential 
expression of an allele that depends on whether it 
occurs in a male or a female. Usually detected in 
heterozygous genotypes. 


sex-influenced trait A gene, usually autoso- 
mal, whose expression differs between males 
and females of a species. Also known as sex- 
influenced expression. 


sex-limited trait A gene or trait expressed 
exclusively in one sex. Also known as sex- 
limited gene. 


sex-linked inheritance The inheritance of 
genes on the sex chromosomes. 


shared derived characteristics Character- 
istics or traits of organisms that evolve from 
more ancestral characteristics or traits found 
in ancestral organisms. 


Shine-Dalgarno sequence In bacterial 
translation, the 5’ UTR mRNA consensus se- 
quence that pairs with nucleotides near the 3’ 
end of 16S rRNA in the small ribosomal subu- 
nit to orient the start codon on the ribosome. 


shotgun sequencing Method for sequencing 
large molecules of DNA that relies on redun- 
dant sequencing of fragmented target DNA in 
the hope that all regions will be sequenced at 
least a few times. Contrast with primer walking. 


shuttle vector A vector that can replicate in 
two species and thus can be used to shuttle 
DNA sequences between them. 


sickle cell disease (SCD) A human autosomal 
recessive disorder resulting from homozygo- 
sity; a specific mutant allele (8°) of the 
B-globin gene that is part of hemoglobin 
protein. 


sigma (o) subunit Accessory protein that 
changes the promoter-recognition specificity 
of the bacterial RNA polymerase core. 


signal hypothesis The accepted hypothesis 
proposing that the polypeptide leader se- 
quence identify post-translational processing 
and transport. 


signal sequence A string of amino acids 

at the N terminal and of certain eukaryotic 
polypeptides containing information directing 
post-translational processing and the extra- 
cellular destination of the polypeptide. Also 
known as leader sequence. 


silencer A eukaryotic cis-acting DNA regula- 
tory sequence to which trans-acting factors 
bind to repress transcription. 


silencer sequence Regulatory DNA se- 
quences that can repress transcription of spe- 
cific genes that may be located distantly from 
the sequence. Also known as a silencer. 


silent mutation A base substitution muta- 
tion that changes one codon to a synonymous 
codon and does not alter the amino acid se- 
quence of a polypeptide. 


simple transposon In bacterial transposi- 
tion, a transposon containing multiple genes 
between two inverted repeats. 


single nucleotide polymorphism (SNP) A 
single base-pair difference in a specific ge- 
nome location detected by comparing indi- 
vidual DNA sequences. 


single-stranded binding protein (SSB) In 
DNA replication, a protein that adheres to each 
template strand following unwinding by heli- 
case to prevent strand reannealing before the 
arrival of the replication fork. 


sister chromatids The identical DNA du- 
plexes that are produced by DNA replication 
and are temporarily joined to one another 
during the early stages of cell division. 


sister chromatid cohesion The protein- 
based temporary attachment of sister chro- 
matids facilitated by cohesin protein that 
resists the pulling forces of spindle fibers in 
metaphase. 


site-directed mutagenesis Introduction 
of specific nucleotide changes in a DNA 
molecule in vitro. 


site-specific recombination An exchange 
between two DNA molecules that requires 
specific sequences in common and that is cat- 
alyzed by an enzyme specific to that recombi- 
nation (e.g., integration of phage lambda into 
the E. coli genome). 


sliding clamp In bacterial DNA replication, 
the multisubunit protein complex that joins 
with DNA polymerase to hold polymerase on 
the template and helps drive polymerase along 
the template. 


small interfering RNA (siRNA) Single- 
stranded 21- to 24-nucleotide RNA molecules 
derived from either endogenous or exogenous 
double-stranded RNA molecules that are 
incorporated in RISC to mediate RNAi. 
Endogenously produced siRNAs are most 
often from non-genic regions (e.g., repetitive 
RNA or products of an RNA-dependent RNA 
polymerase). Exogenously produced siRNAs 
are often derived from invading nucleic acids 
(e.g., transposons and viruses). 


small nuclear RNA (snRNA) Regulatory 
RNAs operating in the nucleus. 


small nucleoid-associated proteins In 
bacterial DNA, small proteins localized to 
the nucleoid and associated with the main 
chromosome. 


small ribosomal subunit The smaller of two 
subunits of the ribosome. 


solenoid structure See 30-m fiber. 


somatic gene therapy Gene therapy aimed 
at correcting a genetic defect in the somatic 
cells. 


Southern blotting A laboratory method de- 
vised by Edwin Southern for transferring DNA 
from an electrophoresis gel to a permanent 
membrane or filter. 


specialized transduction (specialized trans- 
ducing phage) Transduction from a donor 


cell to a recipient cell of a few select genes lo- 
cated near the site of bacteriophage integration. 


spindle fiber microtubule (kinetochore, 
polar, and astral microtubule) Composed 
of tubulin proteins, the fibers emanating 
from centrosomes that attach to kinetochore 
regions (kinetochore), overlap to control cell 
shape (polar), or attach to the cell membrane 
to stabilize centrosomes (astral). 


spliceosome The multiprotein complex that 
carries out intron splicing. 


splicing mutation A mutation altering the 
normal splicing pattern of a pre-mRNA. 


spontaneous mutation Mutations occurring 
due to spontaneous events or changes involv- 
ing nucleotides or nucleotide bases. 


square root method A method for estimat- 
ing allele frequencies based on manipulation 
of the frequency of a homozygous genotype. 


SRY The gene on the mammalian Y chromo- 
some known as the sex-determining region 
of Y that initiates male sex development in 
mammals. 


stabilizing selection A pattern of natural 

or artificial selection that reduces population 
variation by removing organisms with extreme 
phenotypes. 


standard deviation (o) A statistical value 
that measures the scatter of outcome values 
around the mean or average outcome value. 
Expressed as the square root of the sum of 
squared deviations of each value from the 
mean value. 


start codon Most commonly auc, encoding 
methionine, the first codon translated in 
polypeptide synthesis. 


start of transcription The DNA location at 
which transcription begins. 


stem loop Short double-stranded segments 
of RNA topped by a single-stranded loop con- 
taining unpaired nucleotides. Also known as a 
hairpin structure. 


sticky end Short single-stranded overhangs 
created by the cleavage of DNA by specific re- 
striction endonucleases, which can potentially 
base-pair with complementary single-stranded 
sequences. 


stop codon One of three codons that bind 
a release factor instead of base-pairing with 
tRNA to initiate a series of events that stops 
translation. 


strand invasion During synthesis-dependent 
strand annealing and meiotic recombination, 
the entry of the 3’ end of a displaced DNA into 
the intact sister chromatid. 


strand polarity (5' and 3’) The orientation 
of a nucleic acid strand indicating its 5’ 
phosphate and 3’ hydroxyl ends. 


strand slippage During DNA replication, 
a mutational event leading to increased or 
decreased numbers of repeating nucleotides 
in newly synthesized DNA and caused by 


slippage of DNA polymerase on the template 
strand or slippage of the newly synthesized 
strand on DNA polymerase. 


structural gene A protein-producing gene 
whose product plays a biosynthetic, metabolic, 
or structural role in cells. 


structural genomics The sequencing of 
whole genomes and the cataloging, or annota- 
tion of sequences within a given genome. 


structural maintenance of chromosomes 
(SMC) protein A category of bacterial pro- 
teins localized to the nucleoid and associated 
with the main chromosome. 


Su(var) mutations Mutations that suppress 
position effect variegation in Drosophila. 
Mutated genes produce proteins that are 
active in chromatin remodeling. 


subcloning Process by which DNA clones 
are further subdivided in order to clone still 
smaller fragments for analyses. 


subfunctionalization The process, following 
gene duplication, whereby mutations in each of 
the two copies can result in the two genes hav- 
ing complementary activities such that their 
combined activity is the same as the activity of 
the gene before duplication. 


submetacentric chromosome A chromosome 
with a centromere located near the midpoint 
that produces long and short arms of different 
lengths. 


sugar-phosphate backbone The alternating 
sugar (deoxyribose or ribose) and phosphate 
molecule pattern of nucleic acid strands formed 
by the formation of phosphodiester bonds link- 
ing nucleotides in the strand. 


sumrule The probability of an event that can re- 
sult from two or more equivalent outcomes. The 
probabilities of the contributing events are added, 
and their sum is the probability of the event in 
question. Also known as the addition rule. 


supercoiled DNA The superhelical twisting 
of covalently closed circular DNA. See positive 
supercoiling and negative supercoiling. 


suppressor mutation A mutation whose 
effect is to reverse the effect of another muta- 
tion. Acts to restore wild-type, or near wild- 
type, function. 


suppressor screen A modifier genetic screen 
designed to identify mutations in genes that 
suppress the phenotypic effects of mutations in 
another gene. 


SWI/SNF (switch/sucrose nonfermentable) A 
yeast chromatin-remodeling complex that 
modulates nucleosome positioning in an ATP- 
dependent manner. 


SWR1 complex (switch remodeling 1) A 
chromatin-remodeling complex responsible 
for replacing the common histone 2A protein 
of nucleosomes with a variant form known as 
H2AZ. 


sympatric speciation An evolutionary pro- 
cess in which new species form in overlapping 
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regions. Reproductive isolation mechanisms 
accompanying speciation are usually behavio- 
ral or mechanical. 


synapsis The close approach and contact be- 
tween homologous chromosomes during early 
prophase I in meiosis. 


synaptonemal complex A specialized 
three-layer protein complex, consisting of a 
central element and two lateral elements, that 
forms between homologous chromosomes at 
synapsis. 

syncytial blastoderm Stage of Drosophila 
embryogenesis in which the nuclei are located 
at the periphery of the embryo but are not 
separated by cell membranes. 


syncytium A multinucleated cell in which the 
nuclei are not separated by cell membranes. 


synonymous codon The groups of codons 
that specify the same amino acid. 


syntenic genes Genes located on the same 
chromosome. 


synteny The conserved order of genes to- 
gether on a chromosome in species that share 
a common ancestor. 


synthesis-dependent strand annealing 
(SDSA) An error-free mechanism for repair 
of DNA double-strand breaks occurring after 
the completion of DNA replication and utiliz- 
ing strand invasion to provide wild-type se- 
quences for repair. 


synthetic lethality The situation where a par- 
ticular double mutant results in lethality but 
the two respective single mutants are viable. 
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systems biology Prediction of biological 
functions of genes based on correlations be- 
tween different data sets. 


Tstrand The DNA strand of the T-DNA 
cleaved to initiate the transfer of plasmid DNA 
during rolling circle replication. 


targeted induced local lesions in genomes 
(TILLING) A reverse genetic approach 

in which a population of organisms of an 
inbred strain is randomly mutagenized 
throughout the genome and this popula- 
tion is then screened to find mutations in 

a gene of interest for which the sequence is 
known. 


TATA-binding protein (TBP) A general tran- 
scription factor protein that binds the TATA 

box and assists in binding other transcription 
factors and RNA polymerase II to promoters. 


TATA box The thymine- and adenine-rich 
consensus sequence region found in most eu- 
karyotic promoters. Also known as Goldberg- 
Hogness box. 


TBP-associated factor (TAF) Specific gen- 
eral transcription factors that associate with 
TATA-binding protein. 


telocentric chromosome A chromosome 
with a centromere located at one end, produc- 
ing a long arm only. 
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telomerase The ribonucleoprotein complex 
whose RNA component provides a template 
used to synthesize repeating DNA segments 
that form chromosome telomeres. 


telomere Repeating DNA sequences, syn- 
thesized by telomerase, at the ends of linear 
chromosomes in eukaryotes; contain dozens 
to hundreds of copies of specific short DNA 
sequence repeats that buffer the coding se- 
quence of the chromosome from loss during 
successive cycles of DNA replication. 


telophase The last stage of M phase, in which 
the nuclear contents are divided (karyokinesis) 
and the daughter cells are divided (cytokinesis). 


temperate phage A bacteriophage, such as À 
phage, that can integrate into the bacterial host 
chromosome and produce either the lytic or 
lysogenic life cycle. 


temperature-sensitive allele A mutation evi- 
dent only at or above a certain temperature due 
to an abnormality of the protein product that 
affects its stability. 


template strand The DNA strand serving as 
a template for synthesis of a complementary 
nucleic acid strand. 


terminal deletion The loss of a chromosome 
segment that includes the telomeric region. 


terminal inverted repeat Identical se- 
quences found at both ends of a transposable 
genetic element. The sequences are inverted 
relative to one another. 


termination region The region of a gene 
containing the transcription-terminating se- 
quence or region. 


termination sequence DNA sequences that 
serve to stop transcription. Also known as 
transcription termination. 


termination stem loop Stem loop of an 
mRNA transcript that signals RNA polymer- 
ase to terminate transcription in the leader 
region of bacterial attenuator-controlled 
operons (e.g., trp operon). 

tertiary structure The state of protein fold- 
ing stabilized by hydrogen bonds and covalent 
bonds that form the functional structure of a 
protein. A protein may have more than one 
tertiary structure. 


test cross The cross of an organism with the 
dominant phenotype that may be heterozy- 
gous with an organism that is homozygous 
for a recessive allele. Also known as test-cross 
analysis. 


tetrad An ascus containing four haploid 
spores. 


tetrad analysis The analysis of genetic link- 
age by analysis of different tetrad segregation 
types. 

tetratype (TT) In an ascus, the occurrence 
of both types of parentals and both types of 
recombinants among the spores. 


thalassemias A large category of inherited 
anemias caused by reduced production of 
hemoglobin. These mutations cause an 


imbalance in the production of a-globin and 
B-globin protein. 


theta value (6 value) See 6 (theta) value. 


third-base wobble The flexibility of purine- 
pyrimidine base pairing between the third 
base of a codon and the corresponding 
nucleotide of the anticodon. 


three-point test-cross analysis A test cross 
designed to identify genetic linkage between 
three genes and to provide data for determi- 
nation of recombination frequency between 
linked genes. 


three-strand double crossover The occur- 
rence of double crossover involving three of the 
four chromatids. 


threshold of genetic liability In polygenic 
and multifactorial inheritance, a trait with 
different phenotypes (e.g., affected and 
unaffected) that are determined by whether 
individual organisms are above or below a 
particular critical value on the phenotypic 
scale. Also known as threshold trait. 


threshold trait See threshold of genetic liability. 


thymine (T) One of four nitrogenous nucleo- 
tide bases in DNA; one of the two types of 
pyrimidine nucleotides in DNA. 


thymine dimer See pyrimidine dimer. 


tiling array DNA array that contains all se- 
quences of the genome or a genomic interval, 
including introns, exons, untranslated regions 
(UTRs), and intergenic regions. 


time-of-entry mapping A method of donor 
gene mapping by conjugation that uses inter- 
rupted mating to determine the order and 
relative timing of gene transfer. 


Tiplasmid A large (200 kb) circular plasmid 
of Agrobacterium tumefaciens that harbors 
genes for transfer of DNA into plants cells and 
genes that cause uncontrolled division of plant 
cells; hence, the tumor-inducing (Ti) plasmid. 
It has been engineered for the construction of 
transgenic plants. 


topoisomerase Enzyme that relaxes DNA 
supercoiling by controlled strand nicking and 
rejoining. 

totipotency State of a cell when it can give 
rise to any and all cell types of an organism. 


trans-acting Acting between two molecules 
(e.g., DNA sequences that control expression 
of genes interacting with a diffusible protein 
product). 


trans-acting regulatory protein Proteins 
that act in trans by binding to cis-acting regu- 
latory sequences and consequently regulating 
nearby genes, either by activating or repress- 
ing transcription. Often referred to as tran- 
scription factors (TFs). 


transcription The cellular process that syn- 
thesizes RNA strands from a DNA template 
strand. 


transcription factors (TFs) Proteins that bind 
promoters and are functional in transcription. 


transcription-terminating factor | (TTFI) A 
specific protein that binds a termination se- 
quence to stop transcription. 


transcription termination See termination 
sequence. 


transcriptome Set of transcripts present in a 
cell, tissue, or organism. 


transcriptomics The study of all the tran- 
scripts, collectively known as the transcrip- 
tome, within a cell, tissue, or organism. 


transductant The bacterium that is the prod- 
uct of transduction. 


transduction In bacterial systems, the pro- 
cess of transfer of DNA from a donor bacterial 
cell to a recipient cell using a bacteriophage as 
a vector. More generally can refer to the pro- 
cess by which foreign DNA is introduced into 
another cell via a viral vector. 


transfer DNA (T-DNA) The portion of the Ti 
plasmid that is transferred from the bacterium 
into the nucleus of a plant cell. 


transfer RNA (tRNA) A family of small RNA 
molecules that each bind a specific amino acid 
and convey it to the ribosome, where the an- 
ticodon sequence undertakes complementary 
base pairing with an mRNA codon during 
translation. 


transformant The bacterium that is the 
product of transformation. 


transformation (1) The bacterial process of 
gene transfer in which donated DNA frag- 
ments originating in a dead donor cell, or 
plasmid DNA, are taken up across the cell 
wall and membrane of a recipient cell and 
recombined into the transformant genome. 
(2) More generally refers to the process by 
which exogenous DNA is directly taken up by 
a cell resulting in a genetic alteration of the 
cell. (3) The conversion of animal cells to an 
abnormal unregulated state by an oncogenic 
virus or by transforming DNA. 


transgene A gene that has been modified in 
vitro by recombinant DNA technology and in- 
troduced into the genome via transformation. 


transgenic organism An organism harboring 
a transgene. 


transition mutation A type of DNA base- 
pair substitution in which one purine replaces 
the other or one pyrimidine replaces the other. 


translation The process taking place at ribo- 
somes to synthesize polypeptides. Comple- 
mentary base pairing between mRNA codons 
and tRNA anticodons determines the order of 
amino acids composing the polypeptide. 


translation repressor protein In bacteria, 
proteins that regulate translation by binding 
mRNA in the vicinity of the Shine—Dalgarno 
sequence and thereby prevent ribosome 
binding. 

translesion DNA synthesis Utilizing a bypass 
polymerase, a mechanism for replicating DNA 
in the presence of damage that blocks replica- 
tion by the common polymerase. 


translocation heterozygote An organism 
with chromosome translocation in which chro- 
mosome pairs consist of one normal chromo- 
some and a homolog carrying a translocation. 


transmission genetics The subfield of genet- 
ics concerned with assessment and analysis of 
gene transfer from parents to offspring. 
Synonymous with Mendelian genetics. 


transposable genetic element A class of 
DNA sequences that can move from one chro- 
mosome location to another, either by excision 
and reinsertion or by replication and reinser- 
tion of the replicated copy. 


transposase The enzyme produced by trans- 
posons that cuts DNA to allow the excision 
and insertion of the transposon. 


transposition The process by which mobile 
genetic elements move from one portion of 
a genome to another. See also transposable 
genetic element. 


transposon tagging Technique used to 
identify and clone genes through insertion of a 
transposon into the target gene. 


transversion mutation A type of DNA base 
substitution mutation in which a purine sub- 
stitutes for a pyrimidine, or vice versa. 


tree of life The phylogenetic tree depicting 
the evolutionary relationships between 
organisms. 


trihybrid cross A genetic cross between or- 
ganisms that are heterozygous for three genes. 


trinucleotide repeat disorder A hereditary 
disorder caused by a mutant gene containing 
an increased number of repeats of a DNA 
trinucleotide sequence. 


trisomy The presence in a genome of three 
copies of a chromosome rather than a homol- 
ogous pair of chromosomes and resulting in a 
number of chromosomes that is 2 — 1. 


trisomy rescue In a trisomic genome, the 
random loss of one extra chromosome to re- 
duce the chromosome number to the diploid. 


true-breeding strains See pure-breeding 
strains. 


true reversion A type of reversion that ex- 
actly reverses the original mutation. 


tumor suppressor gene A broad category 
of normal genes whose generalized functions 
slow, pause, or stop cell proliferation. It is 
often mutated in carcinogenesis. 


two-hybrid system A method for discover- 
ing whether two proteins interact using the 
GALA protein of yeast, which is separated into 
a DNA-binding domain and a transcriptional 
activation domain. The two GAL4 domains 
are fused with the two proteins of interest 
respectively, and the resultant fusion proteins 
are assayed for their ability to activate tran- 
scription, which indicates interaction of the 
two proteins of interest. 


two-point test-cross analysis A test cross 
designed to identify genetic linkage between 


two genes and to provide data for determina- 
tion of recombination frequency between 
linked genes. 


two-strand double crossover The occurrence 
of a double crossover involving two of the four 
chromatids. 


ultraviolet (UV) repair A multiprotein DNA 
damage repair system that corrects lesions 
caused by exposure to ultraviolet irradiation. 


uncharged tRNA A tRNA not carrying an 
amino acid. 


unequal crossover Resulting from the im- 
proper synaptic pairing of homologous chromo- 
somes and crossing over between the mispaired 
chromosomes. A source of duplication and dele- 
tion of genetic material. 


uniparental disomy In a genome, the pres- 
ence of a pair of homologous chromosomes 
that originate from a single parent. 


uniparental inheritance Condition in orga- 
nellar inheritance whereby just one parental 
gamete—often the maternal gamete—contrib- 
utes all of the cytoplasmic organelles. 


unordered tetrad Haploid spores in an ascus 
that are arranged in random order. 


unpaired loop At synapsis involving partial 
deletion or partial duplication of one chromo- 
some of a homologous pair, the “extra” genetic 
material that does not have a homolog on the 
paired chromosome. 


unselected marker screen An experimental 
technique used to screen microbial genotypes. 
Commonly used following selected marker 
screening. 


unstable mutant phenotype A mutation 
with an unusually high frequency of reversion. 


upstream Referring to a gene or sequence 
location that is toward the 5’ direction of a 
coding strand. 


upstream activator sequence (UAS) An 
enhancer-like sequence in yeast, located just 
upstream of the genes they regulate. 


upstream control element An upstream 
consensus sequence found in certain eukary- 
otic gene promoters. 


uracil (U) One of four nitrogenous nucleotide 
bases in RNA; one of the two types of pyrimi- 
dine nucleotides in RNA. 


variable expressivity Variation in the degree, 
magnitude, or intensity of expression of a 
phenotype. 

variance (S?) A statistical measurement of 


the variation of sample values around the 
mean value. 


vector A DNA fragment with attributes that 
will allow its amplification (origin of replica- 
tion) in a biological system and serves as a 
carrier for foreign DNA inserted into it. Vec- 
tors usually also possess genes (e.g., encoding 
resistance to an antibiotic) that allow selection 
of hosts carrying the vector. 
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virus An infective particle that carries a rudi- 
mentary genome and is an obligate parasite on 
host cells. 


western blotting A method for transferring 
protein from an electrophoresis gel to a per- 
manent membrane or filter. 


whole-genome shotgun (WGS) sequencing 
An approach to genome sequencing whereby 
DNA representing the entire genome is 
fragmented into smaller pieces and a large 
number of fragments are chosen at random 
and sequenced with the aim that all 

genomic regions will be sequenced multiple 
times. Compare with clone-by-clone 
sequencing. 

whole-genome tiling array A microarray 
on which sequences representing the entire 
genome are present. 


writer A chromatin modifying enzyme that 
adds chemical groups to chromatin (e.g., me- 
thyl or acetyl groups added to the lysines of 
histone 3). 


X/autosome ratio (X/A ratio) The ratio of X 
chromosomes to a pair of autosomes. Used 
in Drosophila as the mechanism of sex 
determination. 


X-linked dominant A pattern of inheritance 
consistent with the transmission of a domi- 
nant allele of a gene on the X chromosome. 
Compare with X-linked recessive. 


X-linked inheritance The pattern of inherit- 
ance characteristic of genes located on the 
X chromosome. 
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X-linked recessive A pattern of inheritance 
consistent with the transmission of a recessive 
allele of a gene on the X chromosome. Com- 
pare with X-linked dominant. 


yeast artificial chromosome (YAC) Cloning 
vector used in yeast that utilizes an endog- 
enous yeast origin of replication, centromere, 
and telomere; can accept DNA inserts in ex- 
cess of 1 megabase. 


Y-linked inheritance The exclusively 
male-to-male transmission of genes on the 
Y chromosome. 


Zmax The most likely recombination distance 
(theta [6] value) between genes as determined 
by lod score analysis. 


zone of polarizing activity (ZPA) The poste- 
rior side of the limb bud that acts as an organ- 
izer, secreting Sonic hedgehog (Shh) protein 
that acts to pattern the developing limb. 


Z/W system The sex chromosome inherit- 
ance system in species in which the male is 
homogametic (ZZ) and the female is heteroga- 
metic (ZW). 


zygotic gene Genes that are active only in 
the zygote or embryo. For zygotic genes, the 
genotype of the embryo determines the 
phenotype. 
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temperature-sensitive, 113 
that are both dominant and recessive, 116-117 
variable expressivity of, 119, 120, 120f 
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allele frequencies, 747, 747 
Allele frequencies, autosomal 
in populations, determining, 746-747 
Allele frequency change by sampling error, genetic 
drift causing, 756-758, 756f 
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755-756 
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Allelic identification, 189-190b 
Allelic phase, 166, 166-167, 167f 
Allelic series, 111, 111-115, 113f, 114f 
C-gene, molecular basis of, 113 
Allis, C. Davis, on “histone code,’ 520 
Allolactose, 472-473 
Allopatric speciation, 761, 761, 763, 763f 
Allopolyploids, 437 
Allopolyploidy, 437—438, 438f 
Allosteric domain, 470 
Allosteric effector compound, 470, 470f 
Allostery, 470 
Alpha helix. See a-helix (alpha helix) 
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Alternative intron splicing, 292, 292-294. 
Alternative mRNA processing, 293f, 295b 
Alternative polyadenylation, 293 
Alternative pre-mRNA processing, 292 
Alternative promoters, 293 
Alternative sigma (0) factor, 490, 490, 490f 
Alternative sigma subunits, 272-273 
Altman, Sidney, intron self-splicing and, 296 
Amelogenesis imperfecta, 91t 
Ames test, 408, 408, 409f, 410b 
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transfer RNA in the transport of, 12 
Amino acid structure, 306—307 
Aminoacy] site (A site), 309, 309f 
Aminoacyl-tRNA synthetases (tRNA synthetases), 
322, 323f 
Amorphic mutation, 106, 107f 
Anabolic pathways, 189b 
Anagenesis, 761 
Anaphase, mitotic, 66, 67-68, 68f 
Anaphase I, 76f, 77, 78f 
Anaphase II, 77f 
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genomic imprinting defects in, 523 
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computational approaches to, 618-619, 623b 
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experimental approaches to, 617 
genome, 631-632 
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Antennapedia complex, 691, 691f, 692-694 
Antibiotic resistance, 220 
evolution of (Case Study), 220-221b 
Antibiotic resistance genes, 188 
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Antisense RNA, 491, 494f 
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Apolipoprotein B gene, RNA editing in, 299, 299f 
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from, 89b 
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705f, 706f 
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Archaea, 3, 4, 5f, 242 
reproduction of, 73 
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384-385 
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chromatin organizes, 384-385 
Archaeal histones, 385 
Archaeal initiation factor proteins (alFs), 314 
Archael transcription, 285 
Archael translation initiation, 313-314 
mplications for evolution, 313-315 
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Mendel’s experiments using, 29-30, 29f 
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729-730, 729t, 730f 
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Astrachan, Lazarus, in mRNA discovery, 269 
Astral microtubules, 67, 70f 
Ataxia, 399t, 414t 
Ataxia, Friedreich, 399t 
Ataxia telangiectasia, 414¢ 
ATM, 406 
Attachment (att) site, 212, 212f, 213f 
Attenuation, 484 
Attenuation mutations, 488, 488f 
Attenuator region, 484, 484f 
Autoimmune disease, 735 
Autopolyploids, 437 
Autopolyploidy, 437—438, 437f 
Autosomal genetic linkage, test-cross analysis in 
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Autosomal inheritance, 51 
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Mendel’s hereditary principles and, 51-55 
recessive, 53, 53f 
of sickle cell disease, 56b 
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Avery, Oswald 
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B-globin gene variation, Southern blot analysis of, 
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northern and western blot analysis of, 352-353, 353f 
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pE allele evolution, 358, 358f 
BE allele evolution, 358, 358f 
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B-form DNA, 235 
Bacteria, 3, 4, 5f 
gene structure in, 12f 
gene transfer in 
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by transduction, 206-213 
by transformation, 204, 206, 207f 
genetic analysis and mapping in, 188-226 
polyribosomes of, 319, 320f 
regulating transcription of stress response, 
489-491, 490f 
reproduction of, 73 
restriction enzymes in, 568—572 
translational regulation in, 491—492, 491t 
tRNA processing in, 298 
Bacteria RNA polymerase, 272-273, 272f 
Bacterial artificial chromosomes (BACs), 577 
Bacterial chromosome compaction, 368-369, 
369f, 370f 
Bacterial chromosome organization, 368—369, 368t, 
369f, 370f 
Bacterial chromosomes, 188, 188f. See also Bacterial 
artificial chromosomes (BACs) 
bacteriophage chromosomes mapped by 
fine-structure analysis, 213-219 
Bacterial DNA replication 
initiation of, 244—246, 245f 
origin and directionality of, 237-239 
Bacterial DNA replication polymerases, 247t, 249 
Bacterial genome(s) 
characteristics of, 188 
transposition modifying, 456—459, 456f, 457f, 
457t, 458b 
whole-genome shotgun sequencing of, 614—615, 615f 
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Bacterial transcription, 274f 
elongation in, 274f, 276 
initiation of, 273, 274f, 276, 276t 
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positive control of, 470, 471f 
process of, 271-278 
Bacterial transcription termination, 274f, 276 
Bacterial transcription termination mechanisms, 
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Bacterial translation initiation, 312-313 
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bacteria and bacteriophage 
bacterial transduction mediated by, 206-213 
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Bacteriophage A (lambda). See Lambda (A) phage 
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323-324, 324t 
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Basic local alignment search tool (BLAST), 624, 626b 
Bateson, William 
and complementary gene interaction, 132 
and documentation of human hereditary 
disorder, 2 
and genetic linkage discovery, 148 
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124-127, 125-126, 306 
Beetle, yellow mealworm (Tenebrio molitor), 84 
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Beta-pleated sheet. See B-pleated sheet (beta-pleated 
sheet) 
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in probability theory, 46—48 
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Biosynthetic pathways, 124 
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Bonds, hydrogen, 7 
Boveri, Theodor 
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Caenorhabditis elegans, 548, 553, 594, 613, 697-699 
transgenic, 594, 594f 
Cairns, John, DNA replication in E. coli and, 237 
Calcitonin/calcitonin gene-related peptide (CT/CGRP) 
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Cdk (cyclin-dependent kinases), 70-71 
in cell cycle checkpoints, 70-71, 72f 
regulating cell cycle, 71, 73f 
Cech, Thomas, intron self-splicing and, 296 
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Chargaff, Erwin, DNA structure research by, 6 
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Charged tRNAs, 311 
Chase, Martha, on DNA in bacteriophage infection 
of bacterial cells, 231, 231f 
Chemical mutagens, 404—406, 404t, 405f, 406f 
Ames test for, 408, 408, 409f 
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composition of, 371, 371t, 372-373, 372f 
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Chromatin remodeling 
overview of, 513—514 
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in eukaryotic transcription regulation, 
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in eukaryotic transcription regulation, 513-514 
Chromatin remodeling, 514-515 
in eukaryotic transcription regulation, 512-524 
chemical modifications of chromatin, 517, 517f, 
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Chromatin structure 
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detection of (Case Study), 386-387), 387f 
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in mitosis, 67 
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mapping, 157-158 
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metacentric, 380 
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alleles on, for three-point recombination 
mapping, 157 
recombinant, 145 
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multiple sets of, 90 
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of genes 
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technology in, 542-544, 543f 
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ogy in, 543-544, 544f 
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of plants and animals, 602-603, 603f 
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Coding strand, 11, 11f 
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start, 12, 13f 
stop, 12, 13f 
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Cohesive end sequence (cos) sites, 576-577 
Cohesive (cos) ends, 493, 494f 
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“draft? 616 
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Color blindness as X-linked recessive disorder, 
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nonpolyposis, 414f 
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constructing, 580-581, 580f 
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genetic, 132, 134 
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(Case Study), 137b 
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Composite transposons, 457, 457f, 457t 
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causing, 89b 
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transmission of, 94, 94f 
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203-204, 204f, 205—206b 
gene transfer by, 187-197 
Hfr, 196-197, 196f 
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outcomes of, 194t 
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for RNA polymerase II transcription, 278—282 
Conservative DNA replication model, 236, 237f 
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Constitutive mutants, 477 
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Constitutive transcription, 469 
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Crick, Francis, 7f 
central dogma of biology and, 9, 10f 
proof of triplet genetic code, 323—324, 324t 
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Crossing-over hypothesis, 150, 150f, 160 
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Cyclin D1—Cdk4 complex, retinoblastoma protein 
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Cyclin-dependent kinases (Cdk), 70-71 
in cell cycle checkpoints, 70-71, 72f 
regulating cell cycle, 71, 73f 


Cyclin proteins, 70 
in cell cycle checkpoints, 70-71, 72f 
regulating cell cycle, 71, 73f 
Cyclopia, Shh mutations in (Case Study), 707—709b, 
707-709f 
CYP21 gene mutation and congenital adrenal 
hyperplasia, 89b 
Cystic fibrosis (CF) 
alleles, genotypes, and, 747 
gene therapy for, 601, 602 
linkage data from forms with, 178¢ 
mapping the gene for (Case Study), 177-178b 
Cystic fibrosis transmembrane conductance regula- 
tor (CTFR) allele, 597f 
Cystic fibrosis transmembrane conductance regulator 
(CTER) gene, 177-178b 
Cytokinesis, 66, 68f 
in plant and animal cells, 68, 71f 
Cytomegalovirus (CMV), 366, 367t 
Cytoplasmic inheritance, 4, 649-680, 650 
discovery of, 650-651, 651f 
modes of, 654—662 
transmitting genes carried on organelle 
chromosomes, 650-653 
Cytoplasmic male sterility (CMS) in flowering 
plants, 669b 
Cytosine (C), 7 


D 
DA-binding domain, 470 
Damage signaling systems, 414f 
Darwin, Charles, theory of evolution of, 16-17 
Darwin's finches, contemporary evolution in, 764 
Daughter cells, 65 
Daughter strand in semiconservative replication, 
9, 9f 
Davis, Bernard, on need for physical contact for 
bacterial gene transfer, 191, 192f 
Dawkins, Richard, on molecular basis of evolution, 15 
Dawson, Martin, on DNA as transformation 
factor, 230 
de Vries, Hugo 
on hereditary transmission, 2, 3f 
research of, paralleling Mendel’s, 42 
Deafness. See Ototoxic deafness, mitochondrial 
gene-environment interaction in (Case Study) 
Deaminating agents, mutations induced by, 404, 405f 
Deamination, 402, 403, 403f 
Degrees of freedom (df), 49-50 
Delayed age of onset of lethal alleles, 118, 118, 118f 
Delbriick, Max, 392 
Deletion(s), 440-441 
detecting, 442, 442f 
partial, 441 
Deletion heterozygote, partial, 441 
Deletion mapping, 216, 442, 444, 444f, 445b 
Deletion-mapping analysis, 216—217, 217f, 218f, 219 
Densitometry of hemoglobin proteins, 344, 344f 
Deoxynucleotide monophosphates (ANMPs), 
232, 232f 
Deoxynucleotide triphosphates (dNTPs), 232 
Deoxyribonucleic acid (DNA). See DNA 
Depurination, 402, 402f 
Derived allele(s) 
vs. ancestral allele(s), 768, 768f 
definition of, 768 
Development. See also specific topics 
as building of multicellular organism, 682-684, 
683f, 684f 
evolution of, 700—701 
Developmental genetics, 681-712 
Developmental pathways, 124 
DHJs (double Holliday junctions), 419 
Diabetes, 585 
type 1, 735 
type 2, Neandertal genes and, 769 
Diakinesis stage of prophase I, 75, 76f, 77, 78f 
Dicentric bridge, 447 
Dicentric chromosome, 447 
Dicer, 524, 525f 


Dideoxy DNA sequencing of Huntington disease 
(HD) genes, 262f 
Dideoxynucleotide DNA sequencing (Sanger 
method), 256-259, 258f, 259f, 260b 
Dideoxynucleotide DNA sequencing-dideoxy 
sequencing, 257 
Dideoxynucleotide triphosphate (ddNTP), 257 
Differential reproduction, 748 
Differentiation, cell, 683, 683 
mechanisms of, 683—684, 684f 
Digestive microbiome, 618b 
Dihybrid cross, 36 
Dihybrid-cross analysis of two genes, 36, 36f, 38-39 
Dinucleotides, CpG, 523 
Diploid number of chromosomes, 65 
Diploids, 3 
partial, 203 
conjugation with F’ strains producing, 203—204, 
204f, 205-206b 
single-celled, segregation in, 81, 82f 
Diplotene stage of prophase I, 75, 75f, 78f 
Direct repeats, 453, 455 
Directed assembly, 368 
Directional cloning, 573-574, 573f 
Directional natural selection, 750, 750-751, 750f, 751t 
Directional selection, 730 
Discontinuous variation, 714 
Discordance, 728 
Disjunction 
chromosome, 67 
homologous chromosome, in meiosis I, 77 
Dispersed repetitive DNA, 621 
Dispersive DNA replication model, 236, 237f 
Displacement (D) loop, 417 
Disruptive selection, 730 
Dissociation (Ds) element, 453 
DNA (deoxyribonucleic acid), 5, 227—266. See also 
Mitochondrial DNA; Supercoiled DNA 
in chromosomes, 228 
composition, 8f 
core, 371 
dispersed repetitive, 621 
as hereditary material of organisms, 6 
as hereditary molecule, 6, 228-232, 231f 
heteroduplex, 419 
integration into genome of S. cerevisiae, 
588-589, 589f 
linker, 371 
restriction fragments of, 345-346, 348 
transfer (T-DNA), 589-590 
as transformation factor, 230, 230f 
Watson and Crick’s model of, 6-7, 7f, 236 
DNA-binding domains (DBDs), 470, 765 
DNA-binding proteins 
regulatory, 470—472, 471f 
structural motifs of, 505f 
DNA clone, 572 
DNA damage 
from alkylating agents, 404, 405f 
radiation-induced, 406—408, 407f 
DNA damage repair, 408—415 
direct, 409-413, 413f 
nucleotide excision and replacement in, 411—412, 
412f, 413f 
DNA damage repair disorders, 414, 414t 
DNA damage repair pathway, p53, 413, 414f 
DNA damage signaling systems, 413-414 
DNA double helix, 5, 6, 235f. See also DNA structure: 
double-helical 
DNA duplex, 5. See also DNA double helix 
DNA gyrase, 369 
DNA intercalating agents, mutations induced by, 
406, 406f 
DNA isolation, countertop, 11b 
DNA library(ies), 577, 577, 579-581, 579f-S82f 
complementary, 577 
constructing, 580-581, 580f 
genomic, 577 
constructing, 579-580, 579f 
screening, 581, 582f 


DNA ligase, 248 
DNA loop, 483, 483f 
DNA microarrays, 637, 637f 
DNA molecules 
advances in altering and synthesizing, 598 
long, sequencing, 581-583, 582f 
DNA nucleotide base changes, spontaneous, 
400-402, 401f 
DNA nucleotide lesions, 402—403, 402f, 403f 
DNA nucleotide pairing, complementary, 233f, 
234, 234b 
DNA nucleotides, 7, 232, 232f, 233f, 234 
components of, 7 
excision and replacement of, 411—412, 412f, 413f 
methylation of, in gene silencing, 523-524 
DNA polymerase I (pol 1), 248 
DNA polymerase III (pol III) holoenzyme, 246 
DNA polymerase in DNA strand elongation, 
232, 233f 
DNA proofreading, 249, 249, 251, 251f 
DNA-protein interaction in transcriptional control of 
gene expression, 469-472, 470f, 471f 
DNA replication, 5, 8-9, 9f, 236, 236—240. 

See also Bacterial DNA replication 
bidirectional, evidence of, 237-239, 239f, 240f 
continuous strand, 246-247, 247f 
discontinuous strand, 246-247, 247f 
DNA loss in cycle of, 251-252 
DNA structure and mechanism of, 6-9 
Meselson-Stahl experiment on, 236-237, 

237f, 238f 
models of, 236, 237f 

Kornberg/trombone, 249, 250f 
molecular genetic analytical methods using 

processes of, 254—261 
multiple origins in eukaryotes, 239-240, 241f 
nucleosome distribution and synthesis during, 

374-376, 377f 
Okazaki fragment ligation for, 247—248, 248f 
precisely duplicating genetic material, 241-254 
principal proteins of, 248f 
RNA primer removal for, 247—248, 248f 
in S phase, 66 

DNA replication errors, mutations from, 

397-399, 399f 

DNA sequence variations, identification of, 34, 
345-346, 346f, 348, 348¢ 
DNA sequences 
constructing contiguous, 546, 548f 
gene mutations modifying, 393-397 
recombinant DNA technology recognizing 
(see Recombinant DNA technology) 
at replication origins, 242 
in vivo manipulation of, 598-599 
DNA sequencing technologies, 568-583, 582f 
new, 259, 261 
DNA structure, 5-9, 8f 
double-helical (see also DNA double helix) 
complementary and antiparallel strands in, 
232-236, 234b 
research on, 4, 6 
twisting, 234-236, 235f 
DNA synthesis. See also DNA molecules 
translesion, 407 
protein control of, 407—410 
DNA transfer from organelles, continual, 670—672, 
672f, 673f 
DNA transposons, 455 
DNase I hypersensitive sites, 515, 518b 
dNMPs (deoxynucleotide monophosphates), 

232, 232f 

dNTPs (deoxynucleotide triphosphates), 232 
Dobzhansky, Theodosius 

on evolution, 742 

and modern synthesis of evolution, 17 
Domains of life, 4—5, 5f 
Dominance 

incomplete, 108-109, 109f 

molecular basis of, 105-106 

partial, 108-109, 109f 


INDEX I-5 


Dominance relationships 
of ABO alleles, 110, 110f 
molecular basis of, 111-112, 111f 
allele interactions producing, 105-118 
Dominance variance, 726 
Dominant epistasis (12:3:1), 131f, 133, 133 
Dominant gene interaction (9:6:1 ratio), 130f, 132, 
132-133 
Dominant interaction. See under Epistatic (gene) 
interactions 
Dominant mutant, determination of, 539 
Dominant negative mutations, 107f, 108 
Dominant phenotype, 31, 35 
Dominant suppression (13:3 ratio), 131f, 133, 
133-134 
Donor cell (F*), 191-192 
conjugation between recipient cell and, 
193-194, 193f 
DNA fragment from, recipient cell uptake of, 204, 
206, 207f 
Doppler, Christian, in Mendel’s education, 
26, 27 
Dosage, gene, 433 
Dosage compensation, 95, 95-96 
mechanisms of, in animals, 95¢ 
random X inactivation in, 95—96, 96f 
Double crossovers, 157, 157-158 
frequency of, as consistent with independence of 
single crossovers, 158—159 
Double Holliday junctions (DHJs), 419 
Double recombinants, 157, 157-158 
Double-strand break repair, 416, 416—417, 417f 
Double-stranded DNA breaks, 417—418 
initiating meiotic recombination, 418—419 
Double-stranded RNA (dsRNA), 524 
cleaving, 524—525 
gene silencing by, 524-526, 525f 
Doudna, Jennifer, on cleaving dsRNA, 524 
Down syndrome, 434¢, 450 
familial, 449, 451f 
maternal age and risk of, 435t 
meiotic nondisjunction in, 434, 435t 
Down syndrome critical region (DSCR), 434 
Downstream, 272 
Drosophila, transgenic, 594-595, 595f 
Drosophila melanogaster. See Fruit fly (Drosophila 
melanogaster) 
Duchenne muscular dystrophy, 914, 393, 577 
Duplicate gene action (15:1 ratio), 130f, 
132, 132 
Duplication(s), 441 
detecting, 442, 442f 
gene, 627-629, 628f, 629f 
segmental, 634 
unequal, 441 
whole-genome, 633-634 
Duplication heterozygote, partial, 441 
Dyskeratosis congenita, 254 
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Early genes, 493 
Early operators, 493 
Early promoters, 493 
East, Edward, on multiple-gene hypothesis, 716, 
718-719, 719f 
East-west (EW) resolution, 419 
Edward syndrome, 434t 
Electrophoresis, gel. See Gel electrophoresis 
Electrophoretic mobility, 344 
Ellis-van Creveld (EvC) syndrome, 757 
Elongation factor (EF) proteins, 315 
Embryonic stem (ES) cells, 597-598, 683 
creating, from fibroblasts, 604b 
Emerson, Rollins, three-point test-cross analysis, 
156, 156t 
Endonucleases, restriction, 345, 348t 
Endosymbiont, 670 
Endosymbiosis, 668 
secondary, 674—675, 674-675 
tertiary, 674—675, 674-675 
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Endosymbiosis theory, 670 
of mitochondria and chloroplast evolution, 
668-675 
Enhancement, synthetic, 541, 541f 
Enhanceosome, 507 
Enhancer screen, 540 
Enhancer sequences (enhancers), 282-283, 283f, 
506-507 
conservation of, 510, 510f 
in eukaryotic transcription, 506-508, 508, 510, 510f 
hereditary disorders from mutations of, 509 
insulator sequence interactions with, 511, 512f 
Enhancer trapping, 559, 559-560, 560f 
Enveloped viruses, 367, 367f 
Environment 
mitochondrial gene interactions with, and human 
genetic disease, 675-676b 
and phenotypic variation, 719, 720f, 723b 
recombination frequency and, 164 
Environment-gene interactions, 119-120 
Environmental modification to prevent hereditary 
disease, 120-121 
Environmental variance, 725 
Enzyme(s), restriction, 345, 345-346, 348t 
in recombinant DNA technology, 568-572, 569b, 
571b, 572f 
Epigenetic heritability, 520-521 
Epigenetic modifications, environmental 
(Case Study), 528-529b 
Episome, 193 
Epistasis, 127, 129 
dominant (12:3:1), 131f, 133, 133 
recessive (9:3:4 ratio), 131f, 133, 133 
results of, 127, 129, 129f, 132-134 
pistatic (gene) interactions, 127, 129, 129, 129f, 
132-133, 135b 
complementary (9:7 ratio), 130f, 132, 132 
dominant (9:6:1 ratio), 130f 132, 132-133 
duplicate (15:1 ratio), 130, 130f, 132 
no interaction (9:3:3:1 ratio), 129, 129f 132 
Equilibrium frequency, 358 
Escherichia coli (E. coli) 
chromosome compaction, 368—369 
consolidated Hfr map of, 200-201, 201f, 203 
genome content, 368 
HfrH and F` P678 strains of, genotypes of, 
197—200, 197t 
human insulin production in, 584—586, 586f 
lac operon of, 472 (see also Lac operon) 
nucleoid of, 368, 368f 
pathogenicity islands in, 220 
plant-derived antimalarial drugs produced in, 587b 
ribosome structure, 309-311, 309f 
transgenes in, 583—588, 584f 
trp operon gene map for, 210-211, 212f 
Estrogen biosynthesis pathway, 765-766, 766f 
Ethidium bromide (EtBr), 348-349, 349f 
Euchromatic regions, 381 
Euchromatin, 381 
Eukaryote cells, mitochondria as energy factories of, 
662—666 
Eukaryote chromosomes, organization of, 370-376 
Eukaryotes/eukarya, 4, 5f 
DNA replication polymerases of, 247t, 249 
gene structure in, 12f 
genes in development of, 707b 
genetic linkage and mapping in, 144—185 
(see also Genetic linkage; Genetic linkage 
mapping/maps) 
multiple RNA polymerases in transcription in, 
278-285 
regulation of gene expression in, 504-532 
chromatin remodeling in, 512-524 
cis-acting regulatory sequences in, 506-511 
RNA-mediated mechanisms in, 524—528, 
525f, 526f 
replication origins in 
DNA sequences at, 242 
multiple, 239—240, 241f 
tRNA processing in, 298 
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Eukaryotic expression vectors, 584 
Eukaryotic genome(s) 
transposition modifying, 457—461, 459f, 460f 
whole-genome shotgun sequencing of, 
615-616, 616f 
Eukaryotic initiation factor (eIF), 313 
Eukaryotic lineage, origin of, 673—674, 674f 
Eukaryotic primase, 245-246 
Eukaryotic transcription termination, torpedo model 
of, 287, 288f 
Eukaryotic translation initiation, 313, 314f, 317—318b 
Euploid, 431 
Euploidy, 431 
changes in, resulting in polyploidy, 437—439 
Evans, Martin, knockout mouse development and, 596 
E(var) mutations, 512-513, 512f 
Even-skipped (eve) gene, 689-690, 690f 
Evo-devo, 701 
Evolution, 16. See also under Migration 
adaptive, 16 
chloroplast, endosymbiosis theory of, 
668-675, 671f 
convergent, 18, 752, 752-753 
in Darwin’s finches, contemporary, 764 
Darwin's theory of, 16-17 
of development, 700-701 
of eukaryotes, 673-674, 674f 
gene regulation in, 701-703 
human chromosome (Case Study), 461—462b 
mitochondrial, endosymbiosis theory of, 
668-675, 671f 
modern synthesis of, 17 
molecular basis of, 15—22 (see also Molecular 
evolution) 
multicellular, in plants, 703—707 
by natural selection, 16 
nonadaptive, 16-17 
polyploidy and, 439, 440f 
population genetics and, 742-775 
processes of, 17, 760-764, 762f, 762t, 763f 
through co-option, 701-703, 702f 
Evolution models, hominin, 21 
Evolutionary genetics, 5 
Evolutionary genomics, 612, 622, 624—636. See also 
Interspecific comparisons of genomes 
human genetic diversity and, 766-769, 767f, 768f 
intraspecific genome comparisons in, 634—635 
tree of life in, 624, 625f, 626b 
Evolutionary relationships, tracing, 17—21 
Exconjugant cell, 192 
Hfr, 196-197, 196f 
partial diploid, production of, 203-204, 204f 
Exit site (E site), 309, 309f 
Exonic splicing enhancers (ESEs), 294, 294f 
Exons, 11, 12f 
Expression arrays, 637, 637—638, 639f 
Expression vectors, 583, 583—584, 584f 
eukaryotic, 584 


F 
F` cell (recipient cell), 191-192 
and F* cell, conjugation between, 193-194, 193f 
F* cell (donor cell), 191-192 
and F cell, conjugation between, 193-194, 193f 
F’ (F prime) cells, 203 
F’ (F prime) donor, 203 
F (fertility) factor 
in Hfr strains, 194—195, 195f 
transfer of, 192—194, 192f 
F’ (F prime) factor, 203 
F (fertility) plasmid, 188 
structure of, 192, 192f 
F; (first filial) generation, 30, 30f 
F, (second filial) generation, 30, 30f 
F, self-fertilization in segregation hypothesis testing, 
35, 35f, 36t 
F; (third filial) generation, 30, 30f 
Factor VIII (F8) gene mutation, hemophilia A from, 
92, 92f 
Facultative heterochromatin, 381 


Familial Down syndrome, 449, 451f 
Family, the modern human, 21-22 
Family trees, 52. See also Pedigree(s) 
Fertility, reduced, in aneuploidy, 435, 435f 
Fertilizations, multiple, creating autopolyploidy, 
437, 437f, 438 
Fibroblasts, creating embryonic stem (ES) cells 
from, 604b 
Filion, Guillaume, on chromatin types, 517, 517f 
Finches, Darwin's, contemporary evolution in, 764 
Fine-structure analysis, bacteriophage chromosome 
mapping by, 213-219 
Fire, Andrew, on RNA interference, 524, 553 
First-division segregation, 175, 175f 
FISH (fluorescent in situ hybridization), 378, 
378-379, 378f 
in chromosome abnormality detection in cancer 
cells (Case Study), 386-387), 387f 
Fisher, Ronald 
and evolutionary genetics research, 17 
and statistical analysis of quantitative traits, 721 
Flanking direct repeats, 453, 455 
Fleming, Alexander, penicillin discovered by, 220b 
Florey, Howard, penicillin production and, 220b 
Fluorescent in situ hybridization (FISH), 378, 
378-379, 378f 
in chromosome abnormality detection in cancer 
cells (Case Study), 386-387), 387f 
Forked-line diagram, 38, 38f 
Forward genetic analysis, 534 
Forward genetic screens, designing, 535-539 
Forward genetics, 533-541, 534, 534f 
Forward mutation rate (u), 753 
Forward mutations, 397 
Fossils, earliest, 16f 
Founder effect, 756, 756—757 
Four-strand double crossover, 161, 161, 162f 
Fraenkel-Conrat, Heinz, on nonoverlapping genetic 
code, 323 
Fragile X syndrome, 91¢, 399t 
Frameshift mutation, 324, 395, 396f 
Franklin, Rosalind, research on double-helical 
structure of DNA by, 4, 6, 6f 
Frequency distribution, 721, 724f 
Friedreich ataxia, 399t 
Fruit fly (Drosophila melanogaster), 630b, 754b 
bithorax mutation in, discovery of, 682, 682f 
development of, as paradigm for animal 
development, 684—697, 685f 
development toolkit of, 686-687, 686f 
eye color in 
complementation analysis of, 133, 136, 136f 
genes for, 123, 124f 
mitotic crossover in, 176, 177f 
multiple replication origins in, 240, 241f 
P element in, 459, 459 
pleiotropy in, 121 
scam gene of, alternative splicing in, 293 
sex determination in, 87, 89 
alternative mRNA splicing and (Case Study), 
299-3005, 300f 
studies of genes on chromosomes in, 81, 84-86, 
84, 85f 
Functional cloning, 166 
Functional genomics, 612, 636-644 
genetic networks in, 642-644 
genomic approaches to reverse genetics in, 641, 641f 
transcriptomics in, 636-638, 637f, 639f, 640f 
yeast mutants to categorize genes in, 641—642, 642f 
Functional RNAs, 270, 271 
Fungal hosts, expression of heterologous genes in, 
583-589 
Fungus, generation of transgenic, 588—589, 588/, 589f 
Fusion genes, 585 
Fusion protein, 585 


G 

G (Giemsa) banding, 380 

Go (“G zero”) of interphase, 66, 66f 

G; (Gap 1) phase of interphase, 66, 66f, 71f 


G, (Gap 2) phase of interphase, 66, 66f, 68f 
Gaertner, Carl Friedrich, genetic research of, 45b 
Gain-of-function alleles, 560, 561f 
Gain-of-function mutation, 106, 107f, 108 
Gamete(s), 4, 33, 65 
Gamete frequencies, determining from genetic maps, 
159-160, 159f 
Gap gene expression, domains of, 688—689, 689f, 690f 
Gap genes, 686, 686f 
Garden pea. See Pea, garden (Pisum sativum) 
Garrod, Archibald 
and documentation of human hereditary disorder, 
2,124 
on gene—protein connection, 305 
genetic screens and, 535 
Gaussian (normal) distribution, 48-49, 49f 
GC-rich box, 280-281 
Gel electrophoresis, 342, 342-344, 343f, 344f 
in hemoglobin peptide fingerprint analysis, 344 
in SCD analysis, 349-353 
two-dimensional, in ribosomal protein identifica- 
tion, 310b 
Gender. See Sex 
Gene(s), 2. See also specific topics 
alternative transcripts of single, 290, 292-294 
annotation describing, 617-619, 617f 
births and deaths of, 626-627, 627f 
original definition of, 267 
Gene conversion, 419, 419, 422—423, 422f, 423f 
Gene dosage. See Dosage, gene 
Gene dosage alteration, 432-433 
Gene expression, 10-15. See also Bacteria; Bacterio- 
phage; Eukaryotes 
in bacteria and bacteriophage, regulation of, 
468-503 
antiterminators and repressors in, 492—497 
inducible operon system in, 473—483 
repression and attenuation in, 483—488 
stress response and, 489-491 
translational, 491—492 
DNA-protein interaction required for transcrip- 
tional control of, 469-472, 470f, 471f 
monitoring with reporter genes, 556-559, 
558f, 559f 
transcription in, 10-12 
translation in, 12-13 
Gene expression machine model, for coupling 
transcription with pre-mRNA processing, 
289-290, 294f 
Gene families, 619, 619 
Gene flow, 755 
effects of, 753, 755f 
Gene identification, genome sequencing to deter- 
mine, 549, 551, 551f 
Gene interaction(s), 104-143, 121. See also Epistatic 
(gene) interactions 
allelic series as, 111-115, 113f 114f 
delayed age of onset of, 118, 118f 
in pathways, 121-124, 124f 
types of, 105 
Gene knockouts. See Knockouts, gene 
Gene order, 632-634, 633f 
Gene pool, 743-744 
mutation diversifying, 753-755 
Gene reconstruction, ancestral, 765 
Gene therapy, 600 
in curing sickle cell disease in mice, 604b, 605f 
germinal, 601 
human, 601-602, 601¢ 
somatic, 601 
using recombinant DNA technology, 600-602, 
601t, 605f 
Gene therapy proof of principle, 604b 
Gene transmission 
basic principles of, discovered by Mendel, 27-31 
in mitosis, 4 
in sexual reproduction, 4 
Generalized transducing phages, 209, 210f 
Generalized transduction, 206, 209, 210f 
Genetic bottlenecks, 757, 757-758, 757f 


Genetic chimera. See Chimera, genetic 
Genetic code, 321f 
deciphering, 326-327, 326f, 327t, 328f, 329b 
experiments in, 322-330 
displaying third-base wobble, 321-322, 322t 
no gaps in, 324, 326 
nonoverlapping, 323, 324f 
redundancy of, 321 
in translation, 12 
of mRNA into polypeptide, 320-322 
triplet, 323-324, 324t 
universality of, 327—328, 328¢ 
Genetic code specificity, tRNAs and, 328, 330 
Genetic complementation analysis. See Complemen- 
tation analysis 
Genetic cross, controlled, 30 
Genetic dissection, 124, 126 
to investigate gene action, 126-127, 127f, 128b 
Genetic distances and relationships between human 
populations, 766, 767f 
Genetic diversity, human, 766-769, 767f, 768f 
and evolution, 766—769 
Genetic drift, 756, 756-758, 756f 
random, as evolutionary process, 17 
Genetic fine structure, 213 
discovery of, 213, 215 
Genetic heterogeneity, 134 
Genetic hitchhiking, 753 
Genetic liability, 720 
threshold of, 720 
Genetic linkage, 145 
autosomal, test-cross analysis in detection of, 
150-151, 151f, 152b 
complete vs. incomplete, 147—148, 147f 
discovery of, 148—150 
in haploid eukaryotes, identified by tetrad analysis, 
171-175 
vs. independent assortment, 146-147, 146f 
indications of, 146-148 
for three-point recombination mapping, data 
consistent with the proposal of, 157 
Genetic linkage analysis, as tracing genome evolu- 
tion, 170-171 
Genetic linkage data, chi-square analysis of, 154 
Genetic linkage mapping/maps, 145. See also Map- 
ping 
basis of, 153-154, 153f 
biological factors affecting accuracy of, 164, 164f 
constructing three-point recombination, 156—159 
cotransduction, 210, 210-211, 211f, 212f 
first, 153-154, 153f 
gamete frequency determination from, 
159-160, 159f 
mapping linked human genes using lod score 
analysis, 166—169 
by transformation, 204, 206, 207f 
units for, 154 
using lod score analysis, 168f, 168¢, 169b, 170b 
for cystic fibrosis gene, 177-178b 
Genetic map, using DNA markers to construct, 
545-546, 547f 
Genetic maps distances. See also Genetic linkage 
mapping/maps 
correction of, 165-166, 166f 
Genetic networks, 642 
Genetic potential, 714-715 
Genetic principles, 4 
Genetic redundancy, 541 
in flower development, reverse genetics and 
(Case Study), 561—563b 
Genetic screen(s), 534 
for conditional alleles in haploid organisms, 
538-539, 539f 
designing, 535-539 
general design, 535 
enhancer, 540 
forward, 535-541 
modifier, 540 
in mutagenesis analysis, 539, 540b 
strategies of 
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for identifying dominant and recessive muta- 
tions, 536-537, 537f 
mutagen selection in, 536, 536t 
organism selection in, 536 
use of balancer chromosomes for tracking muta- 
tions, 537-538, 538f 
suppressor, 540 
Genetic theorists, early-20th-century, 3f 
Genetic variance. See Variance 
Genetics. See also specific topics 
ancient applications of, 2, 3f 
evolutionary, 5 
history of modern, 2—4 
in modern biology, 4—6 
notation systems in, 109 
overview of, 1—2 
from whole-genome perspective, 611-648 (see also 
Genomics) 
Genome(s), 4. See also Bacterial genome(s) 
chimpanzee vs. human, 767f, 768 
chloroplast, structure of, 667—668, 667f 
eukaryotic 
transposition modifying, 457—461, 459f, 460f 
whole-genome shotgun sequencing of, 
615-616, 616f 
history of, 622, 624-636 
introducing foreign genes into, to create transgenic 
organisms, 583-599 
lambda phage, 493, 494f 
lateral gene transfer in, 220 
modern human, 21 
nucleotide-base composition of, 7t 
organelle 
replication of, 652, 653f 
replicative segregation of, 653 
variable segregation of, 654f 
plant, transformation by Agrobacterium, 589-594, 
590f, 592f, 593f 
of S. cerevisiae, integrating DNA into, 
588-589, 589f 
transposable genetic elements moving through, 
450-456, 451f 
viral, 366, 367¢ 
Genome annotation, 617-619, 617f, 631-632 
Genome comparisons. See also Interspecific 
comparisons of genomes 
intraspecific, 634-635 
Genome evolution, genetic linkage analysis as 
tracing, 170-171 
Genome organization among species, variation in, 
620-621, 621f, 622f 
Genome sequence analysis, determination of 
mutation rate from, 393 
Genome sequence draft, human, 616 
Genome sequences. See also Whole-genome shotgun 
(WGS) sequencing 
archaic, 21 
examples, 612¢ 
insights from, 621-622 
reference, 635 
Genome sequencing to determine gene identifica- 
tion, 549, 551 
Genome structure, mitochondrial, 662—665, 
663f, 664f 
Genome-wide association studies (GWAS), 734, 
734-736, 736b 
Genomic era of genetics, 4 
Genomic imprinting, 522, 522-523, 523f 
Genomic islands, 220 
Genomic libraries, 577 
constructing, 579-580, 579f 
Genomic story of hominins, 21-22 
Genomics, 13, 611—648. See also Evolutionary 
genomics; Functional genomics; Structural 
genomics 
definition of, 611 
Genomics approach to gene identification following 
mutagenesis, 551, 551f 
Genotype(s), 4 
homoplastic and heteroplastic, 651-652, 652f 
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Genotype frequencies 
inbreeding altering, 758—760 
in populations, 743—748 

Genotype proportion method, for determining auto- 
somal allele frequencies, 747, 747 

Genotypic ratio (1:2:1), 33 

Genotyping, microbial growth for, 189-190b 

Germ-line cells, 65 

Germinal gene therapy, 601 

Gilbert, Walter, DNA-sequencing protocols of, 256 

Globin gene mutations, 340-341, 341f 

Globin genes, 340, 340f 

Goldberg-Hogness box, 280 

Golden Rice (Oryza sativa), 591, 593-594, 593f 

Goss, John, genetic research of, 45b 

Grant, Peter, 764 

Grant, Rosemary, 764 

Green, Kathleen, on intragenic recombination, 162 

Green, Melvin, on intragenic recombination, 162 

Green fluorescent protein (GFP), 557 

Greider, Carol, telomere and telomerase discovery 
and, 252-253 

Griffith, Frederick, transformation factor identified 
by, 229-230, 229f 

Grooves, major and minor, 235 

Growth medium, selective, 195 

Guanine (G), 7 

Guide RNA (gRNA), 298, 298f 

Guide strand, 524 

Gusella, James, studies of Huntington disease by, 549 

GWAS (genome-wide association studies), 734, 
734-736, 736b 

Gynandromorphy, 436, 437f 

Gyrase, DNA, 369 


H 
H19 gene, 522 
Haemophilus influenzae genome, whole-genome 
shotgun sequencing of, 614-615, 615f 
Hairpin, 277 
Haldane, J. B. S., evolutionary genetics research 
and, 17 
mapping function of, 166, 166f 
Haploid, 3 
Haploid eukaryotes, genetic linkage in, identified by 
tetrad analysis, 171-175 
Haploid number of chromosomes, 65 
Haploid organisms, screening for conditional alleles 
in, 538-539, 539f 
Haploinsufficient allele, 106 
Haplosufficient allele, 106 
Haplotype, 171 
Hardy, Godfrey, on genotype frequencies in popula- 
tions, 743 
Hardy-Weinberg (H-W) equilibrium, 743, 744-746, 
744f, 744t, 745f, 749b 
chi-square test of predictions of, 749 
CODIS based on, 769—770b 
for more than two alleles, 747—748, 748¢ 
HATs (histone acetyltransferases) in chromatin 
modification, 517, 519, 519f 
Hayes, William 
on F factor transfer, 192 
on interrupted mating in time-of-entry 
mapping, 197 
Hb. See Hemoglobin (Hb) 
HDACs (histone deacetylases) in chromatin modifi- 
cation, 517, 519, 519f 
HDMTrs (histone demethylases) in chromatin modi- 
fication, 519 
Helicase, 244 
Helix-turn-helix (HTH) motif, 471-472, 471f 
Hemizygous, 85 
Hemoglobin (Hb), 339 
inherited variant of, causing sickle cell disease, 
339-341, 339f, 340f 
peptide fingerprint analysis of, in SCD, 344 
structural change in sickle cell disease, 342f 
Hemoglobin protein peptide fragment analysis, 
344, 345f 


Hemophilia 
F8 gene mutation causing, 92 
in royal families of Europe, 92, 92f 
as X-linked recessive disorder, 93 
Hemophilia A, gene mutation causing, 91—93, 
91t, 92f 
Hereditary disorder. See also specific disorders 
environmental modification for prevention of, 
120-121 
first documentation of, 2 
Hereditary molecule, DNA as, 230-232, 231f 
Hereditary nonpolyposis colorectal cancer 
(HNPCC), 411 
Hereditary transmission 
early studies on, 2 
purpose of, 6 
Heredity 
blending theory of, 26 
chromosome, and cell division, 64—103 
chromosome theory of, 65 
proof of, 86, 86f 
Heritability, 726-727 
broad sense, 727, 727 
measuring genetic component of phenotypic 
variation, 726—730 
narrow sense, 727 
artificial selection and, 729-730, 729, 730f 
twin studies of, 727-729, 729t 
Herpes simplex virus (HSV), 366, 367t, 596 
Herrick, James, history of sickle cell disease and, 
339-340 
Hershey, Alfred, on DNA in bacteriophage infection 
of bacterial cells, 231, 231f 
Heterochromatic regions, 381 
Heterochromatin, 381 
centromeric, 382—384 
Heterochromatin protein-1 (HP-1), 513, 513f 
Heteroduplex DNA, 419 
gene conversion as directed mismatch repair in, 
419, 422—423, 422f, 423f 
Heteroduplex region, 419 
Heterogeneity, genetic, 134 
Heterologous genes, expression in bacterial and 
fungal hosts, 583-589 
Heteroplasmic cell/organism, 652 
Heteroplasmy, 651-652, 652, 652f 
penetrance of human hereditary disease and, 
658-659, 658f 
Heterotetramer, 308 
Heterozygote(s) 
inversion, 446, 447f 
natural selection favoring, 751-752, 751t 
partial deletion, 441 
partial duplication, 441 
translocation, 448 
Heterozygous advantage, 358 
for BAB® individuals, 357-358 
Heterozygous genotype (heterozygote), 33 
Hfr chromosomes, 194, 195f 
formation of, 194—195 
Hfr gene transfer, 195-197, 196f 
Hfr maps, consolidation of, 200-201, 201f, 203 
High-frequency recombination (Hfr) strains, 194 
High-throughput sequencing, transcriptome analysis 
by, 638 
Histone acetyltransferases (HATs) in chromatin 
modification, 517, 519, 519f 
Histone deacetylases (HDACs) in chromatin 
modification, 517, 519, 519f 
Histone demethylases (HDMTs) in chromatin modi- 
fication, 519 
Histone methyltransferases (HMTs) in chromatin 
modification, 512, 513, 513f, 519 
Histone proteins (H1, H2A, H2B, H3, H4), 
286-287, 371 
archaeal DNA wrapping of, 385f 
characteristics, 371¢ 
phylogenetic origins of, 385 
Histones, archaeal, 385 
HIV (human immunodeficiency virus), 366, 367t 


HMTs (histone methyltransferases) in chromatin 
modification, 512, 513, 513f, 519 
Hoelzel, A. Rus, on genetic bottlenecks, 758 
Holliday, Robin, model of meiotic recombination 
of, 418 
Holliday junction, 419, 420-421f 
Holliday junction resolution, 419 
Holliday model, 418 
of meiotic recombination, 418 
Holoenzyme, 246 
Holoprosencephaly, Shh mutations in (Case Study), 
708-709f, 708b 
Homeobox, 692 
Homeodomain, 692 
Homeotic activity in floral-organ identity, 704—707 
combinatorial activity of, in floral-organ 
identity, 705f 
Homeotic genes, 686f, 687 
combinatorial activity of floral, in floral-organ 
identity, 706f 
parasegmental pattern of expression of, 691-695 
Homeotic MADS box transcription factors, 706-707 
Homeotic mutations, 682 
Hominin evolution models, 21 
Hominins, genomic story of, 21-22 
Homologous chromosomes, 2 
Homologous genes, 628 
Homologous nucleotides, 624 
Homologous recombination, 588, 589f 
Homologs, 2, 628 
Homology, 17, 19f 
Homoplasmic cell/organism, 651 
Homoplasmy, 18, 651, 651-652, 652f 
Homotetramer, 308 
Homozygosity, reduced recessive, in polyploids, 439, 
443—444b 
Homozygous genotype (homozygote), 33 
Horowitz, Norman, genetic dissection analysis 
of, 126 
Host cell, 366 
Hotspots, 261-262 
of mutations, 393 
recombination as dominated by, 164—165 
Housekeeping genes, 686 
Hox genes, 691f, 692, 694b 
downstream targets of, 695 
in metazoans, 695, 696f 
parasegment specification by, 691-695 
HP-1 (heterochromatin protein-1), 513, 513f 
HTH (helix-turn-helix) motif, 471-472, 471f 
Huberman, Joel, pulse-chase labeling evidence of 
bidirectional DNA replication and, 
238-239, 239f 
Human Genome Project (HGP), 13, 611, 616 
Human Genome Sequencing Project (HGSP), 766 
Human immunodeficiency virus (HIV), 366, 367¢ 
Hunchback gene, 688, 689, 689f 
Huntington, George, description of Huntington dis- 
ease by, 548-549 
Huntington disease (HD) 
delayed age of onset of dominant lethal allele in, 
118, 118f 
presymptomatic molecular diagnosis of, 262 
transgenic mouse model of, 600b 
Huntington disease (HD) genes, 263f 
dideoxy DNA sequencing of, 262f 
positional cloning in identification of (Case Study), 
548-549, 550f 
wild-type, 262 
Huntington disease mutations, PCR and DNA 
sequencing in analysis of (Case Study), 
261-262b 
Hybrid dysgenesis, 459, 459f 
Hybrid vigor, 438 
Hybridization, 343, 354b. See also In situ 
hybridization 
Hydrogen bonds, 7 
Hydroxylating agents, mutations induced by, 
405, 405f 
Hypermorphic mutations, 107f, 108 


Hypertrichosis, congenital generalized, 91t 
transmission of, 94, 94f 
Hypomorphic mutation, 107f, 108 
Hypophosphatemia, 91t 
Hypothesis testing 
by Fy self-fertilization, 35, 35f, 36t 
by test-cross analysis, 34-35, 34f, 35f 
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ICR (imprinting control region), 522 
Identical by descent (IBD), 758, 758-759 
Illegitimate recombination, 588, 589f 
ILs (introgression lines), 732-733 
Imprinting, genomic, 522, 522-524, 523f 
Imprinting control region (ICR), 522 
In situ hybridization, 378 
of chromosomes, 377-379, 378f 
Inbreeding, coefficient of, 758, 758-759, 758t, 
759f, 760b 
Inbreeding altering genotype frequencies, 758-760 
Inbreeding depression, 759-760 
Incomplete dominance, 108-109, 109f 
Incomplete penetrance, 118, 118-119, 119f 
Indels, 766 
Independent assortment 
vs. genetic linkage, 145-146 
genetic linkage and, 146-147, 146f 
meiosis and, 79-81 
testing of 
by test-cross analysis, 39, 39f, 41 
by trihybrid-cross analysis, 41—42, 41f 
Independent assortment, law of (Mendel’s second 
law), 38 
meiosis and, 79-81, 80f 
Induced mutations, 403—404, 403-408 
by chemicals, 404—406, 404t, 405f, 406f 
Inducer compound, 470, 470f 
Inducer-repressor complex, 474 
Inducible operon, 472 
Induction, 497, 683, 684f 
Inductive signal, 697f, 698 
Inductive signaling between cells, 697—699f, 697-700 
Influenza virus, 367¢ 
Informational genes, 674 
Ingram, Vernon, hemoglobin peptide fingerprint 
analysis and, 344 
Ingroup, 18 
Inheritance. See also Cytoplasmic inheritance 
autosomal, 51 
autosomal dominant, 51, 52f 
autosomal recessive, 53, 53f, 56b 
biparental, 650 
in Saccharomyces cerevisiae, 661—662, 661f 
maternal, 651 
mitochondrial, in mammals, 654—662 
particulate, 33, 33-34 
evidence of, 33 
polygenic, 714 
sex-linked, 84 
uniparental, 650 
X-linked, 84-85, 84f 
X-linked dominant, 90 
X-linked recessive, 90 
Y-linked, 94-95 
Inherited variation 
gene mutation as source of, 392—417 (see also 
Mutation(s)) 
meiotic recombination as source of, 418—419 
Inhibition, 683-684, 684f 
lateral, 700 
in cellular differentiation, 700, 700f 
Inhibitor compound, 424f, 470 
Initial committed complex, 281, 282f 
Initiation complex, 282, 313 
Initiation factor (IF) proteins, 312, 312-313, 312f 
Initiator tRNA, 311, 312f 
Inosine (I), 322 
Insertion mutants, use in reverse genetics, 552 
Insertion or deletion (indel) in human genetic 
diversity, 766 


Insertion sequence (IS) elements, 192 
in bacterial genomes, transposition of, 456-457, 
456t, 458b 
Insertional inactivation, 451 
Insulator sequences, 511, 511, 512f 
Insulin growth factor 2 (IGF2) gene, 522, 529 
Insulin production in E. coli, human, 584—586, 586f 
Interacting and redundant genes, identifying, 
540-541 
Interactive variance, 726 
Interactome, 641 
Intercalating agents, mutations induced by, 406, 406f 
Interchromosomal domain, 379 
Interference (I), 158, 158-159 
negative, 159 
Internal control regions (ICRs), 284, 284f 
Internal promoter elements, 284 
Interphase, 65, 66f, 68f, 74 
imaging chromosome territory during, 
379-380, 379f 
Interrupted mating, 197 
Interrupted mating analysis, producing time-of-entry 
maps, 197-203, 198/, 202b 
Interspecific comparisons of genomes, 622, 624 
gene content, 624-631, 625f, 627f-629f 
gene order in, 632-634, 633f 
genome annotation, 631-632, 632f, 633f 
Interstitial deletion, 441, 441f 
Intragenic recombination, 162, 162, 162f 
Intragenic recombination analysis, 216 
Intragenic reversion, 397, 398f 
Intraspecific comparisons of genomes, 624, 634-635 
Intrinsic termination, 277, 277, 277f 
Introgression lines (ILs), 732-733 
QTL analysis in, 732-734, 733f 
Intron(s), 11, 12f 
Intron self-splicing, 294, 294f, 296 
Intron splicing, 285 
pre-mRNA, 287-288, 289f, 290f 
alternative, 290, 292, 292-294 
Inversion, chromosome, 446, 446—448, 
446f-448f, 452b 
Inversion heterozygotes, 446 
Inversion loop, 446, 447f 
Inverted repeat (IR), 277 
Inverted repeat (IR) sequence, 456, 456f 
Irons, Ernest, history of sickle cell disease and, 339 
Irritable bowel syndrome (IBS), 618b 
IS (insertion sequence) elements, 192 
Island model of migration, 755, 755f 
Isoaccepting tRNAs, 321, 321f 
Isolation, reproductive, 760-761 
mechanisms of, 762¢ 
speciation and, 761, 763-764 
ISWI (imitation switch) complex in chromatin 
remodeling, 515, 516, 519f 
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Karyotype, 377, 378f 
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Kinetochore microtubules, 67, 70f 
in diakinesis, 76f, 77 
in metaphase I, 76f, 77 
Klinefelter syndrome, 434t 
Knight, Thomas Andrew, genetic research of, 45b 
Knockout libraries, 552, 641, 641f 
Knockouts, gene, 588 
Kornberg, Arthur 
model of DNA replication, 249, 250f 
nucleosome-based model of chromatin, 371, 373 
Kosambi, Damodar, modified mapping function of, 
166, 166f 
Kozak, Marilyn, Kozak sequence discovered by, 313 
Kozak sequence, 313 
Krüppel gene, 689, 690, 690f 


L 
Lac, 189b, 472 
Lac operon, 472 
function of, 473—476, 475f 
as inducible operon system under negative and 
positive control, 472—476 
molecular analysis of, 480—483, 480f, 482b, 483f 
mutational analysis deciphering genetic regulation 
of, 476-483 
regulatory mutations of, 477—480, 482b 
structure of, 473, 474f 
transcription conditions for, 480¢ 
Lac operon gene, regulatory sequence for, 476t 
Lac’ phenotype, 473 
Lac* phenotype, 472 
LacA gene, 473 
Lacl gene (“lack eye”), 473 
Lactase, 752 
Lactose, 752 
Lactose metabolism, 472—473, 473f 
Lactose (lac) operon. See Lac operon 
LacY gene, 473 
LacZ gene, 473 
Lagging strand(s), 246, 247f 
simultaneous synthesis of leading strands and, 
248-249, 249f, 250f 
Lahn, Bruce, 96—97 
Lambda (À) phage (bacteriophage A), 366, 367¢, 
492-497 
genome of, 493, 494f 
lysogeny in, 493 
induction of, 497, 497f 
Lanktree, Matthew, on genes influencing adult 
height, 714 
Large ribosomal subunit, 309 
Lariat intron structure, 289 
Last universal common ancestor (LUCA), 4, 315 
Late genes, 493 
Late operators, 496 
Late promoters, 496 
Lateral gene transfer (LGT), 629, 631 
in genomes, 220 
Lateral inhibition in cellular differentiation, 
700, 700f 
Law of independent assortment. See Independent 
assortment, law of (Mendel’s second law) 
Law of segregation. See Segregation, law of (Mendel’s 
first law) 
LCR (locus control region), 508-509, 511f 
LCT gene, 752-753, 766 
Leader region, 484, 484f 
Leader sequences. See Signal sequences (leader 
sequences) 
Leading strand(s), 246, 247f 
simultaneous synthesis of lagging strands and, 
248-249, 249f, 250f 
Leaky mutation, 107f, 108 
Leber hereditary optic neuropathy (LHON), mito- 
chondrial mutations and, 656f, 658 
Lederberg, Joshua, bacterial DNA transfer identifica- 
tion and, 188, 191, 191f 
Leptotene stage of prophase I, 75, 75f 
Lesch-Nyhan syndrome, 91¢ 
Lethal alleles, 113, 113-117 
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Lethal mutations, 113-117, 115f 116f 
Lethality, synthetic, 541, 541, 541f 
Leukemia, chronic myelogenous, 386f, 387, 387b 
Lewis, Edward B 
Drosophila mutation studies of, 691 
on pattern formation in Drosophila, 684 
LHON (Leber hereditary optic neuropathy), 
mitochondrial mutations and, 656f, 658 
Li-Fraumeni syndrome, 414t 
Case Study, 423-424b 
Li-Fraumeni syndrome 1 (LFS1), 424b 
Life-forms, ancient, 15-16 
Ligand-binding domains (LBDs), 765, 766 
LINE elements of humans, 460—461 
Linkage, genetic. See Genetic linkage 
Linkage disequilibrium (LD), 171, 753 
Linkage equilibrium, 171 
Linkage groups, 166 
of genes, 166 
Linked genes, assortment of, 145-146 
Linker DNA, 371 
Linkers, 574 
Locus control regions (LCRs), 508-509, 511f 
Lod score, 167, 168t 
Lod score analysis, 167—169 
mapping linked human genes using, 166-169, 168f, 
168¢, 169b, 170b 
for cystic fibrosis gene, 177-178b 
Lod score curves, 168f 
Long arm (q arm), 377, 377f 
Long noncoding RNAs (IncRNAs), 521 
Long terminal repeats (LTRs), 460 
Loss-of-function alleles, 560, 561f 
Loss-of-function mutation, 106, 107f, 108 
LTRs (long terminal repeats), 460 
LUCA (last universal common ancestor), 4 
Lupus, Neandertal genes and, 769 
Luria, Salvador, 392 
Lwoff, André, 468f 
on lac operon analysis, 476 
Lymphoma, Burkitt’s, 386f, 387, 387b 
Lyon (random X inactivation) hypothesis, 95 
Lysis, 206 
Lysogenic cycle, 208f, 209 
Lysogeny, 209 
in lambda phage, 493 
induction of, 497, 497f 
lambda repressor protein and, 496, 497f 
Lysogeny induction, resumption of lytic cycle 
following, 497 
Lytic cycle, 207, 208f 
resumption of, following lysogeny induction, 497 
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substages of, 66—69 
MacLeod, Colin 
and deoxyribonucleic acid identification, 4 
on DNA as transformation factor, 230, 230f 
MADS box, 706 
MADS box transcription factors, homeotic, 
706-707 
Major, Daphne, 764 
Major genes, 715 
Major groove, 235 
Malaria, 587b 
fÈ allele and, 353, 357-358 
geographic distribution of, 357f 
plant-derived drugs for, produced in E. coli, 587b 
Mammals 
coat color in, C-gene system for, 111-115, 
113f, 114f 
mitochondrial inheritance in, 654—662 
sex determination in, 88, 88f 
Map unit (m.u.), 154 
Mapping. See also Genetic linkage mapping/maps; 
Quantitative trait loci (QTLs) mapping 
cotransduction, 210, 210—211, 211f, 211¢, 
212f, 214b 
deletion, 442, 444f, 445b 


interrupted mating analysis producing, 198f, 202b 
time-of-entry, 197-198 
interrupted mating analysis producing, 197-203 
Mapping function, Haldane’s, 166, 166f 
Maps. See Genetic map; Restriction maps 
Marker screens, selected and unselected, 210 
MARs (matrix attachment regions), 374, 375f 
Maternal effect genes, 687 
Maternal inheritance, 651 
Mating 
consanguineous, 758 
nonrandom, altering genotype frequencies, 
758-760 
Mating-table analysis of genotype and allele frequen- 
cies, 745, 746f, 746t 
Matrix attachment regions (MARs), 374, 375f 
Matthaei, Johann Heinrich, deciphering genetic code 
and, 326 
Mature mRNA, 285 
Maxam, Allan, DNA-sequencing protocols of, 256 
Mayr, Ernst, modern synthesis of evolution and, 17 
MCIR gene, 766 
McCarty, Maclyn 
and deoxyribonucleic acid identification, 4 
on DNA as transformation factor, 230, 230f 
McClintock, Barbara 
research on crossing over in corn, 160, 160f 
transposition discovered by, 451, 453 
MCS (multiple cloning site), 574 
Mean (4), 48-49 
Median, 724 
Median value, 724 
Mediator in GAL gene system transcription 
regulation, 511 
Megabases (Mb), 13 
Meiosis, 4, 65 
Mendelian hereditary principles and, 79-81 
vs. mitosis, 73—75, 74t 
overview of, 73-75, 74f 
Meiosis I, 74—75, 76f 
nondisjunction in, 431, 432f 
Meiosis II, 75, 77f, 79, 79f 
nondisjunction in, 431, 433f 
Meiotic nondisjunction, creating autopolyploidy, 
437, 437f 
Meiotic recombination, 420—421f 
inherited variation from, 418—419 
Melanocortin-1 receptor (MC1R) gene, 766 
Mello, Craig, on RNA interference, 524, 553 
Memory impairment, 441 
Mendel, Gregor (Johann), 26, 26f 
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by, 27-31 
chi-square analysis of experimental data of, 
50-51, 51t 
experimental innovations of, 29-31 
genes and seed shapes described by, 453, 454—455b 
hereditary principles of, and autosomal 
inheritance, 51-55 
on hereditary transmission, 2 
modern experimental approach of, 26 
molecular genetics of traits examined by, 
54-55, 55t 
rediscovery of the research of, 42, 44 
Mendelian genetics, 5, 26. See also Transmission 
genetics 
Mendelian ratios 
gene interactions modifying, 121-134 
mechanistic basis of, 79-81 
probability theory predicts, 44—48 
Mendelism, 26. See also Transmission genetics 
in produce aisle (Case Study), 44b 
Mendel’s first law. See Segregation, law of (Mendel’s 
first law) 
Mendel’s second law. See Independent assortment, 
aw of (Mendel’s second law) 
Meristems, 703 
development at, 703—704, 704f 
MERRF (myoclonic epilepsy with ragged red fibers), 
mitochondrial mutations and, 658, 659 
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Metabolomics, 15 
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Metaphase, mitotic, 66, 67, 68f 
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Metaphase II, 77f 
Metaphase plate, 67, 76-77f 
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Methionine biosynthesis pathway, genetic dissection 
of, 126-127, 127f 
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MicroRNA (miRNA), 271, 524 
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kinetochore, 67, 70f 
nonkinetochore, 67 
polar, 67, 70f 
spindle fiber, 67 
Miescher, Friedrich, DNA isolation by, 228 
Migration, 755, 755-756, 755f 
and evolution of humans, 22f, 655-656, 655f 
as evolutionary process, 17, 22f 
origin of, 342 
Minimal initiation complex, 282, 282f 
Minimal medium, 189b 
Minor groove, 235 
Mismatch repair in heteroduplex DNA, gene 
conversion as directed, 419, 422—423, 
422f, 423f 
Missense mutations, 394, 394, 394f 
Mitochondria, 3, 4, 662 
endosymbiosis theory of evolution of, 
668-675, 671f 
as energy factories of eukaryotic cells, 662—666 
structure of, 662, 663f 
Mitochondrial DNA 
mother-child identity of, 654-655, 655f 
research analyzing, 657b 
Mitochondrial DNA sequences and species evolution 
of, 655 
Mitochondrial genome structure and gene content, 
662-665, 663f, 664f 
Mitochondrial inheritance in mammals, 654—662 
Mitochondrial mutations and human genetic disease, 
656, 656f, 658-659, 669b, 675-676b 
Mitochondrial transcription and translation, 
665-666, 665f, 666f 
Mitosis, 4, 65 
gene transmission in, 4 
vs. meiosis, 73—75, 74t 
overview of, 69, 71f 
in somatic cell division, 65-73 
Mitosomes, 673 
Mitotic crossover, 176 
producing distinctive phenotypes, 175-176, 177f 


Mitotic nondisjunction, creating autopolyploidy, 
437-438, 437f 
MN blood group, 112b 
Modal value, 724 
Mode, 724 
Model organisms, 13 
Modern synthesis of evolution, 17 
Modifier genes, 715 
Modifier screen, 540 
Mold, red bread (Neurospora crassa) 
growth variants of, 125-126b 
met- mutant strain of, genetic dissection analysis 
of, 126-127, 127f 
ordered ascus analysis of, 173—175, 175f, 176f 
Molecular biology 
of transcription and RNA processing, 267-304 
of translation, 305-337 
Molecular charge and electrophoretic gels, 343 
Molecular cloning in recombinant DNA technology, 
572-577, 573f-575f, 577t, 578f 
Molecular disease, 339, 342 
Molecular evolution. See also Evolution: molecular 
basis of 
changing genes and genomes through time, 
764-769 
Molecular genetic analysis 
gel electrophoresis in, 342-344, 343f, 344f 
of lac operon, 480—483, 480f, 482b, 483, 483f 
of thalassemia, 359-360, 359f 
using DNA replication processes, 254—261 
dideoxyribonucleotide DNA sequencing as, 
256-259, 258f, 259f, 260b 
polymerase chain reaction as, 254—256, 255f, 
257f, 260b 
Molecular genetics, 5 
inauguration of the era of, 4 
of Mendel’s traits, 54—55, 55t 
Molecular probes, 343, 348-349, 349, 354—355b 
Molecular shape (molecular conformation) and elec- 
trophoretic gels, 343 
Molecular weight and electrophoretic gels, 343 
Monod, Jacques, 468f 
on lac operon analysis, 476, 477—478 
Monohybrid cross(es), 33 
segregation of alleles and, 31-36, 32f 
Monophyletic group, 17 
Monosomic chromosomes, 432 
Monosomy, 432 
Morgan, Lillian, 84 
Morgan, Thomas Hunt, 144f 
and discovery of recombinant chromosomes in 
gametes, 418 
on genetic linkage and mapping, 144, 145 
and genetic linkage discovery, 148—150, 149f 
hypothesis of recombination by crossing over, 160 
and origin of developmental genetics, 682 
studies on Drosophila melanogaster, 81, 84—85, 85f 
Morphogens, 683 
Morphological evolution, 18f 
Morton, Newton, lod score analysis and, 167 
Mosaicism, 435-436, 436f 
mRNA. See Messenger RNA (mRNA) 
MRSA (methicillin-resistant Staphylococcus aureus), 
evolution and spread of, 221b 
Müller, Hermann, 452b 
balancer chromosome used by, 537 
on radiation-induced mutations, 535 
Multicellular organism, development as building of, 
682-684, 682f-684f 
Multifactorial traits, 714 
Multimeric proteins in dominant negative muta- 
tions, 108 
Multimers, 308 
Multiple cloning site (MCS), 574 
Multiple-gene hypothesis, 715 
Multiplication rule of probability theory, 44-45 
Multiregional (MRE) hypothesis, 21 
Multiregional (MRE) model of H. sapiens evolution, 
655-656, 655f 
Mus musculus, transgenic, 596-598, 597f 


Muscular dystrophy, 91¢ 
Duchenne, 91t, 393, 577 
Mutagen(s), 403-404 
Ames test for, 408, 408, 409f, 410b 
chemical, 404—406, 404t, 405f, 406f 
for genetic screen, selection of, 536, 536t 
Mutagenesis, 403-404, 535 
genetic screen analysis of, 540b 
genomics approach to gene identification follow- 
ing, 551, 551f 
number of genes identified in, 539 
saturation, 535 
Mutagenesis analysis, genetic screen in, 539 
Mutant alleles, 262 
identified for gene, number of, 539, 540b 
(see also under Mutation(s)) 
Mutants 
constitutive, 477 
insertion, use in reverse genetics, 552 
revertible and nonrevertible, 216 
Mutation(s), 341, 753 
altering human sex development, 89b 
amorphic, 106, 107f 
in analysis of gene transcription, 282, 282f 
attenuation, 488, 488f 
of B-globin gene alleles, evolution of, 358, 358f 
base-pair substitution, 394, 394-395, 394f 
bithorax, discovery of, 682, 682f 
cell cycle, and cancer, 72 
chromosome breakage causing, 439—444, 441f, 
442f, 444f, 445b 
cis-acting, 478, 478f 
constitutive repressor protein, 478f, 479, 479t 
definition of, 391 
diversifying gene pools, 753-755 
dominant negative, 107f, 108 
enhancer, hereditary disorders from, 509 
as evolutionary process, 17 
forward, 397 
frameshift, 324, 395, 396f 
gain-of-function, 106, 107f, 108 
in genetic screens 
balancer chromosomes for tracking, 537, 538f 
identification of dominant and recessive, 536— 
537, 537f, 539 
globin gene, 340-341, 341f 
homeotic, 682 
hotspots of, 393 
hypermorphic, 107f, 108 
hypomorphic, 107f, 108 
identifying types of, 400b 
induced, 403-404, 403-408 
leaky, 107f, 108 
lethal, 113-117, 115f, 116f 
loss-of-function, 106, 107f, 108 
missense, 394, 394, 394f 
mitochondrial, and human genetic disease, 656, 
656f, 658-659, 669b, 675-676b 
modifying DNA sequence, 393-397 
neomorphic, 107f, 108 
nonsense, 394, 394f 
null, 106, 107f 
p53 gene, causing Li-Fraumeni syndrome, 
423-424b 
PEV, 512-513, 512f 
point, 394 
promoter, 396, 396f, 479-480 
from radiation, 406—408, 407f 
regulatory, 395-397, 396, 396f, 398f 
of lac operon, 477—480, 481b 
reverse, 397, 398f 
reversion, 324 
in same gene, mutations in different genes distin- 
guished from, 133-135 
silent, 394, 394f 
splicing, 396-397, 396f, 397 
spontaneous, 397 
from spontaneous events, 397—403 
super-repressor protein, 478f, 479, 479t 
suppressor, 397 
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transition, 394 
transversion, 394 
Mutation rates, 392, 392-393, 393t 
Mutation-selection balance, 753, 753-755 
Mutational analysis deciphering genetic regulation of 
lac operon, 476—483, 477t 
Mycoplasma, 598 
Myers, Richard, promoter mutation analysis of, 282, 
282f 
Myoclonic epilepsy with ragged red fibers (MERRF), 
mitochondrial mutations and, 658, 659 
Myotonic dystrophy (type I), 399t 


N 
N-formylmethionine (fMet), 313 
Naegeli, Karl, 27 
Narrow sense heritability, 727 
artificial selection and, 729-730, 729t, 730f 
Nathans, Daniel, restriction endonucleases 
and, 569b 
Natural selection 
directional, 750, 750-751, 751t 
evolution by, 16 
as evolutionary process, 17 
favoring heterozygotes, 751-752, 751t 
operating through differential reproductive fitness 
within population, 748-753 
Naudin, Charles, 45b 
NDR (nucleosome-depleted region), 514 
Neandertal DNA in the modern human genome, 
768-769 
distribution of, 769f 
Neandertals, derived alleles in, 768 
Near isogenic lines (NILs), 732-733 
Neel, James, transmission genetic analysis of SCD 
and, 342 
Negative control of transcription, 469-470 
Negative interference, 159 
Negative supercoiling, 369 
Neofunctionalization, 628 
Neomorphic mutations, 107f, 108 
Neurospora crassa. See Mold, red bread (Neurospora 
crassa) 
Next-generation DNA sequencing technologies, 259, 
259f, 261 
Next-generation sequencing, 259 
NHE]J (nonhomologous end joining), 416-417, 417f 
NILs (near isogenic lines), 732-733 
Nilsson-Ehle, Hermann, multiple-gene hypothesis 
and, 715 
Nirenberg, Marshall, deciphering genetic code and, 
326-327 
Nodes, 624 
Nonadaptive evolution, 16-17 
Noncoding sequences, conserved, 631 
in genome annotation, 631-632, 632f 
Noncomposite transposon, 456 
Nondisjunction, 86 
analysis of, 85-86 
chromosome number changes from, 431-437 
meiotic, creating autopolyploidy, 437, 437f 
mitotic, creating autopolyploidy, 437—438, 437f 
Nonenveloped viruses, 366-367, 367f 
Nonhistone proteins, 371 
Nonhomologous end joining (NHE)), 
416-417, 417f 
Noninducible, 479 
Nonparental ditypes (NPD), 172 
Nonparental phenotypes, 38 
Nonpenetrant organisms, 118 
Nonrecombinant chromosomes, 145 
Nonrecombinant vector, 573 
Nonreplicative transposition, 456 
Nonrevertible mutants, 216 
Nonsense mutations, 394, 394f 
Nonsister chromatids, 76 
binding of, synaptonemal complex in, 76, 78f 
Nontemplate strand, 271 
Normal (Gaussian) distribution, 48-49, 49f 
North-south (NS) resolution, 419 
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Northern blotting, 349, 355b, 454b 
in B-globin gene transcript and protein analysis, 
352-353, 353f 
Notation systems in genetics, 109 
Notch gene, 444 
Nuclear mitochondrial sequences (NUMTS), 671 
Nuclear plastid sequences (NUPTS), 671 
Nucleoid, 3, 368, 652 
of E. coli, 368f 
Nucleolus, 283 
Nucleomorph, 675 
Nucleosome(s), 371, 372f 
chromatin remodeling by modification of, 
514-517, 515f, 518b, 519f 
Nucleosome core particle, 371 
Nucleosome-depleted region (NDR), 514 
Nucleosome distribution and synthesis during 
replication, 374-376, 377f 
Nucleosome structure, 373, 373f 
Nucleotide(s). See also DNA nucleotides 
homologous, 624 
RNA, 11, 268-271 
Nucleotide base analogs, 404 
mutations induced by, 404, 404f 
Nucleotide excision repair, 411, 412f 
Null mutation, 106, 107f 
NUMTS (nuclear mitochondrial sequences), 671 
NUPTS (nuclear plastid sequences), 671 
Niisslein-Volhard, Christiane 
mutagenesis strategy used by, 538 
on pattern formation in Drosophila, 684, 686 


(0) 
Obsessive-compulsive disorder (OCD), 441 
Ogura, Yasunori, genome-wide association studies 
of, 736b 
Okazaki, Reiji, research on short fragment synthesis, 
246-247 
Okazaki fragment(s), 247 
ligation of, for DNA replication, 247—248, 248f 
On the Origin of Species (Darwin), 16 
Oncogene, 71 
One gene-one enzyme hypothesis, 124, 124—127, 
125-126b 
Open chromatin, 514-515, 516f 
Open promoter complex, 273, 274f, 276 
Open promoters, 514 
Open reading frames (ORFs), 618-619, 625f, 669 
Operational genes, 674 
Operator(s), 470 
early, 493 
lacO, 473 
late, 496 
Operon(s), 472. See also Lac operon; Tryptophan 
(trp) operon 
amino acid, attenuation in, 488, 488f 
inducible, 472 
repressible, 484 
Operon system(s) 
attenuation in, 488 
inducible, 473—483 
lac operon as, 472-476 
ORB (origin recognition box), 242, 244 
Ordered ascus, 173 
Ordered ascus analysis, 173-175, 175f, 176f 
Organelle genome replication, 652, 653f 
Organelle genomes 
replicative segregation of, 653 
variable segregation of, 654f 
Organelles, continual DNA transfer from, 670-672, 
672f, 673f 
Organelles proteins, encoding of, 672-673 
Organizer in pattern formation, 683 
Origin of migration, 342 
Origin of replication (ori), 237 
in bacteriophage vectors, 577 
DNA sequences at, 242, 244f 
in plasmids, 574, 575f 
Origin of transfer (oriT), 193 


Origin recognition box (ORB), 242, 244 
Ornithine transcarbamylase deficiency, 91t 
Orr-Weaver, Terry, double-stranded break model of 
meiotic recombination and, 418 
Orthologous genes, 628 
Orthologs, 628 
Osteogenesis imperfecta, dominant negative 
mutation in, 108 
Ototoxic deafness, mitochondrial gene-environment 
interaction in (Case Study), 675-676b 
Outgroup, 18 
Ovarian cancer, 414t, 415 
mapping a gene for susceptibility to, 169b 
Ovomucoid gene, intron splicing of, 289 
Oxidative reactions, mutations induced by, 
405-406 


P 
p arm, 376-377, 377f 
P element, 459-460, 594 
Drosophila transformation mediated by, 
594-595, 595f 
p53 DNA damage repair pathway, 413, 414f 
p53 gene mutations, causing Li-Fraumeni syndrome, 
423-424b 
Paabo, Svante, 21 
Pachytene stage of prophase I, 75, 75f 
Page, David, 96-97 
Pair-rule genes, 686, 686f 
regulation of, 689-691, 690f 
Paired-end sequencing, 613, 614f 
PAR. See Pseudoautosomal region (PAR) 
Paracentric inversion, 446, 446f 
Paralogous genes, 628 
Paralogs, 628 
Paraphyletic group, 18 
Parasegments, 687 
specification of, by Hox genes, 691-695 
Parental chromosomes, 145 
alleles on, for three-point recombination map- 
ping, 157 
Parental ditypes (PD), 172 
Parental generation (P generation), 30, 30f 
Parental strand in semiconservative replication, 
9, 9f 
Parovirus, 367t 
Partial chromosome deletion, 440, 440—441 
Partial deletion, 441 
Partial deletion heterozygotes, 440, 441 
Partial diploid, 203 
conjugation with F’ strains producing, 203-204, 
204f, 205—206b 
Partial dominance, 108-109, 109f 
Partial duplication, 441 
Partial duplication heterozygote, 441 
Particulate inheritance, 33, 33-34 
Pascal’s triangle, 47-48, 47f 
Patau syndrome, 434t 
Paternity, population genetics in identifying (Case 
Study), 769-771b, 770f, 770t 
Pathogenicity islands, 220 
Pattern formation in development, 683—684, 
683f, 684f 
maternal effects on, 687 
Pauling, Linus 
densitometry of hemoglobin proteins, 344, 344f 
electrophoretic analysis of hemoglobin protein, 
342-344, 343f 344f 
on sickle cell disease as molecular disease, 
339, 342 
PCNA (proliferating cell nuclear antigen), 249 
PCR (polymerase chain reaction), 254, 254-256, 
255f, 257f, 259, 261-262b 
detecting the number of repeats, 262 
in recombinant DNA technology, 581, 582f 
PCR primers, 255 
Pea, garden (Pisum sativum) 
incomplete dominance in, 109, 109f 
Mendel’s research on, 27 


Pearson, Karl, on genotype frequencies in 
populations, 743 
Peas, shaped by transposition, 453, 454—455b 
Pedigree(s), 51-52, 53, 56, 90-96, 118-121, 166-169, 
254—261, 347, 350-351b, 352-360, 423-424, 
433-437, 439-444, 449, 528-529, 548-549, 
654—662, 714, 727-729, 734-737, 758-760 
symbols for, 51-52, 52f 
Penetrance, incomplete, 118, 118-119, 119f 
Penetrant organisms, 118 
Penicillin, discovery and use of, 220—221b 
Peptide bond, 12, 13f, 306 
Peptide bond formation, 306, 306f 
Peptide fingerprint analysis, 344 
hemoglobin, 344, 347b 
Peptidyl site (P site), 309, 309f 
Pericentric inversion, 446, 446f 
Permissive condition, 538 
Phages, 231. See also Bacteriophage(s) 
Phenotype(s), 4 
dominant, 31, 35 
genes producing variable, 118—121, 119f, 120f, 122f 
mitotic crossover producing distinctive, 
175-176, 177f 
nonparental, 38 
recessive, 31 
selection of single traits with dichotomous, 30 
unstable mutant, 453 
Phenotypic ratio (3:1 and 9:3:3:1), 33 
Phenotypic variance, 725 
partitioning, 725, 725f, 726b 
sources of, 725f 
Phenotypic variation 
continuous, quantitative traits displaying, 714—721 
effects of polygenes on, 716, 718t, 722b 
environmental factors and, 719, 720f, 723b 
heritability measuring genetic component of, 
726-730 
statistical description of, 721, 723-725, 724f 
Phenylketonuria (PKU), prevention of, 120-121 
PHOS gene, 520 
Phosphodiester bond, 7 
Photoproducts, 406 
UV-induced, 406—408, 407f 
repair of, 412—413, 413f 
Photoreactive repair, 412, 413f 
Photosynthesis, chloroplasts as sites of, 666—668, 
666f-668f 
Phylogenetic footprinting, 631, 632f 
Phylogenetic shadowing, 631, 633f 
Phylogenetic tree, 17, 18f 
constructing 
using cladistic approach, 17 
using molecular data, 19-21, 19f 
using morphological and anatomical data, 
18, 18f 
Physical gaps, 614-615 
Pisum sativum. See Pea, garden (Pisum sativum) 
PKU (phenylketonuria), prevention of, 120-121 
Plants 
cloning of, 602-603 
multicellular evolution in, 703—707 
transgenic 
in agriculture, 591 
creating, 589-593, 592f 
Plasmid(s). See also F (fertility) plasmid 
bacterial, 187—188, 188, 188f 
R (resistance), 188 
as cloning vectors, 574-575, 575f, 
generation of yeast, 588, 588f 
Ti, 589-590, 590f 
Plastid, 666 
Pleiotropic genes, 121, 122f 
Pleiotropy, 121, 122f 
Pluripotent cells, 683 
Pneumonia, 229 
Point mutations, 394 
Polar microtubules, 67, 70f 
Poliovirus, 367t 


Poly-A tail, 285 
Polyacrylamide, 342-343 
Polyadenylation, alternative, 293 
Polyadenylation signal sequence, 286, 287f 
Polycistronic mRNA, 320, 473 
bacterial, translation of, 320f 
translation of, 320 
Polycomb group (PcG), 519 
Polydactyly 
enhancer mutations causing, 509 
incomplete penetrance for, 118-119, 119f 
Shh mutations in (Case Study), 709b 
Polygenes and phenotypic variation, 716, 718t, 722b 
Polygenic inheritance, 714 
Polygenic traits, 714 
Polymerase(s). See also RNA polymerase(s) 
bypass, 416 
DNA replication 
bacterial, 247t, 249 
eukaryotic, 247t, 249 
translesion DNA, 416 
Polymerase chain reaction. See PCR (polymerase 
chain reaction) 
Polymorphism(s). See also Single nucleotide poly- 
morphism (SNP) 
balanced, 358, 751 
restriction fragment length, 346, 350-351), 546 
Polypeptide(s), 12, 306 
composed of amino acid chains assembled at ribo- 
somes, 306-311 
genetic code in translation of mRNA into, 320-322 
Polypeptide and transcript structure, 307-309, 308¢ 
Polypeptide elongation in translation, 315-318 
Polypeptide processing, posttranslational, 330, 
330-331, 330f 
Polyploidy, 3, 437, 437 
consequences of, 438 
euploidy changes resulting in, 437—439 
evolution and, 439, 440f 
Polyribosomes, 319 
Population(s), 743 
Darwin's principles of, 16, 17 
Population genetics, 742—775, 743 
genetic distances and relationships between 
human populations, 766, 767f 
in solving crime and identifying paternity (Case 
Study), 769-77 1b, 770f, 770t 
Position effect variegation (PEV), 382, 521 
mutations modifying, 512f 
Position effect variegation (PEV) mutations, 512—513 
Positional cloning, 544-549, 545, 545f, 547f, 548f 
Positional information, 683, 683f, 684f 
Positive control of transcription, 470 
Positive-negative selection, 596 
Positive supercoiling, 369 
Positron effect variegation (PEV), 512 
Post-transcriptional RNA editing, 298-299, 
298f, 299f 
Posttranslational polypeptide processing, 330, 
330-331, 330f 
Postzygotic mechanisms, 761, 762t 
Prader-Willi syndrome, 436 
genomic imprinting defects in, 522-523 
Pre-mRNA (precursor mRNA), 285 
polyadenylation of 3’, 286-287, 287f 
splicing signal sequences in, 288—289, 290f 
Pre-mRNA processing 
3’ polyadenylation in, 286-287, 287f 
5’ capping in, 285, 286f 
alternative, 292, 293f, 295b 
intron splicing in, 287—288, 289f, 290f 
Pre-mRNA processing steps, coupling of, 
289-290, 294f 
Precursor mRNA. See Pre-mRNA 
Preinitiation complex, 313 
Prereplication complex (preRC), 244 
Prezygotic mechanisms, 761, 762t 
Pribnow box sequence, 273 
Primary structure of polypeptides, 308 


Primase, 245 
Primer walking, DNA sequencing by, 582, 582f 
Probability 
binomial, 46, 46—48 
conditional, 45, 45—46 
Probability calculations in problem solving in 
genetics, 42 
Probability theory 
predicts Mendelian ratios, 44—48 
product rule of, 44-45 
sum rule of, 45 
Probability (P) value, 49, 50, 50¢ 
Product (multiplication) rule of probability theory, 
44-45 
Progenote, 5f 
Proliferating cell nuclear antigen (PCNA), 249 
Prometaphase, mitotic, 66, 68f 
Promoter(s), 11, 12f, 271 
alternative, 293 
bacterial, 272f, 273, 275b 
covered, 514, 514-517, 514f 
early, 493 
eukaryotic 
detecting promoter consensus elements, 
282, 282f 
recognition of, 280—281, 282f 
lacP, 473 
late, 496 
open, 514 
RNA polymerase I, 283-284, 283f 
RNA polymerase III, 284, 284f 
in transcription, band shift assay to identify, 
279-280b 
Promoter consensus sequences, archaea, 285f 
Promoter mutations, 396, 396f, 479-480 
Promoter-specific element (PSE), 284 
Prophage, 209 
Prophase, mitotic, 66, 68f 
Prophase I, 75—76f 
Prophase II, 77f 
Protein(s), 12 
activator, 470 
chromatin-remodeling, 513 
controlling double-strand break repair, 415-417 
controlling translesion DNA synthesis, 407—410 
cro, and entry of lambda phage into lytic cycle, 
496, 496f 
cyclin, 70 
in cell cycle checkpoints, 70, 72f 
regulating cell cycle, 71, 73f 
DNA-binding, regulatory, 470—472, 471f 
in DNA damage signaling systems, 413 
elongation factor, 315 
fusion, 585 
green fluorescent, 557 
histone (H1, H2A, H2B, H3, H4), 286-287, 
371, 371t 
initiation factor, 312, 312-313, 312f 
lambda repressor, and lysogeny, 496, 497f 
modularity of, 619, 621f 
multimeric, in dominant negative mutations, 108 
nonhistone, 371 
organellar, encoding of, 672-673 
regulatory 
binding to /ac operon regulatory 
sequences, 482b 
trans-acting, 507 
repressor, 470 
rho, 277 
small nucleoid-associated, 369 
SR, 294, 294f 
structural maintenance of chromosomes, 369 
translation repressor, 491 
two-dimensional gel electrophoresis and identifi- 
cation of ribosomal, 310b 
Protein—DNA interaction in transcriptional control 
of gene expression, 469—472, 470f, 471f 
Protein packaging, viral, 366-368 
Protein reconstruction, ancestral, 765 
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Protein sorting, 330-331 
Proteome, 15, 638 
Proteomics, 15, 638 
Proto-oncogenes, 71 
Prototrophs, 189b 
Proximal elements in eukaryotic transcription, 506 
Prune-killer (K-pn), 541 
PSE (promoter-specific element), 284 
Pseudoautosomal region (PAR), 79 

of X and Y chromosomes, 79, 79f, 97 
Pseudodominance, 442 
Pseudogene, 626 
Pseudohermaphroditism, gene mutation causing, 89b 
Pulse-chase labeling, evidence of bidirectional DNA 

replication produced by, 238-239, 239f 

Punnett, Reginald, 33 

complementary gene interaction and, 132 

and genetic linkage discovery, 148 

on genotype frequencies in populations, 743 
Punnett square, 33, 38, 38f 
Pure-breeding strains, 30, 30f 
Pyrimidine dimer, 406 


Q 
q arm, 377, 377f 
QTLs (quantitative trait loci). See Quantitative trait 
loci (QTLs) 
Quantitative genetics, 714 
Quantitative trait(s), 714 
allele segregation in production of, 716, 
718-719, 719f 
displaying continuous phenotype variation, 
714-721 
Quantitative trait analysis 
genetic analysis, 713—741 
statistical methods in, 721, 723-726 
Quantitative trait loci (QTLs), 730 
gene identification for, 732—734, 733f 
genome-wide association studies in identification 
of, 734-737, 736f 
Quantitative trait loci (QTLs) mapping, 730 
Quantitative trait loci (QTLs) mapping strategies, 
730-732, 731f, 732t 
Quaternary structure of polypeptides, 308 
Quinn, Chip, genetic screen of, 535 


R 
R gene in peas, identification and analysis of alleles 
of, 453, 454—455b 
R-group, 306-307 
R (resistance) plasmid, 188 
Radial loop-scaffold model, 374 
Radiation. See also Ultraviolet (UV) repair 
DNA damage induced by, 406—408, 407f (see also 
Ultraviolet (UV) irradiation, damage from) 
Random genetic drift, as evolutionary process, 17 
Random X inactivation (Lyon) hypothesis, 95 
RBI gene mutation and cancer, 72 
Reading frame, 324 
Realizator genes, 695 
RecBCD pathway, 418 
Recent African origin (RAO) hypothesis, 21 
Recent African origin (RAO) model of H. sapiens 
evolution, 655-656, 655f 
Recessive epistasis (9:3:4 ratio), 131f, 133, 133 
Recessive homozygosity, reduced, in polyploids, 439, 
443—444b 
Recessive mutant, determination of, 539 
Recessive phenotype, 31 
Recessiveness, molecular basis of, 105—106 
Recipient cell (F7), 191-192 
and donor cell, conjugation between, 
193-194, 193f 
Reciprocal cross(es), 31, 31f 
to determine X-linkage of genes, 84-85, 84f, 85f 
in Z/W system, 90, 90f 
Reciprocal translocation, 448, 448, 449f, 450f 
Recombinant chromosomes, 145 
Recombinant clone, 572 
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Recombinant DNA molecules 
amplifying, 575-576, 578f 
creating, 572-574, 573f, 574f 
Recombinant DNA technology, 568 
applications of, 567—610 
in cloning genes by complementation, 
542-544, 543f 
in cloning genes using transposons, 543-544, 544f 
in cloning highly expressed genes, 582f 
in creation of transgenic organisms, 583-599 
DNA libraries in, 577, 579-581, 579f-S82f 
DNA sequencing technologies in, 582f 
gene therapy using, 600-602, 6011, 605f 
genes identified by mutant phenotype are cloned 
using, 542-551 
molecular cloning in, 572-577, 573f-575f, 
577t, S78f 
polymerase chain reaction in, 581, 582f 
in positional cloning, 544-548, 545f, 547f, 548f 
restriction enzymes in, 568—572, 569b, 571b, 572f 
Recombination 
along chromosomes, limits of, 160-162, 161f, 162f 
from crossing over, 160-166 
cytological evidence of, 160, 160f 
as dominated by hotspots, 164—165 
and evolution and genetic diversity, 170-171 
between genes, 163b 
within genes, 162, 162f 
homologous, 588, 589f 
illegitimate, 588, 589f 
intragenic, 162, 162, 162f 
meiotic, 420-421f 
inherited variation from, 418—419 
site-specific, 598, 598-599, 599f, 600b 
Recombination frequency, 148 
of gene pairs, 148 
biological factors affecting, 164, 165f 
genetic linkage mapping based on, 
153-154, 153f 
limits of, 161 
linked, calculation of, 163b 
physical distance between genes and, 165, 166f 
in three-point recombination mapping, 158 
between genes, 153-165 
Recombination hotspots, 164 
Recombination nodules, 76, 76 
Recombination protein homology, 418-419, 419t 
Red blood cells, normal and sickle-shaped, 339f 
Red bread mold. See Mold, red bread 
(Neurospora crassa) 
Reduction division, 77. See also Meiosis I 
Redundant and interacting genes, identifying, 
540-541 
Redundant genes, identifying, 541 
Reference genome sequence, 635 
Regulated transcription, 469 
Regulatory mutations, 395-397, 396, 396f, 398f 
Regulatory proteins 
binding to lac operon regulatory sequences, 482b 
trans-acting, 507 
Regulatory sequences 
cis-acting, 507 
in eukaryotic gene expression regulation, 
506-511 
mutations in, 509 
Relative fitness, 748-749 
differential reproductive fitness and, 748-749 
Release factors (RF), 318, 318-319 
Reovirus, 367t 
Replica plating, 189b 
Replicate crosses, 30 
Replication. See also DNA replication 
of organelle genomes, 652, 653f 
rolling circle, 194 
semiconservative, 8, of 
Replication bubble, 237, 240f, 246, 247f 
Replication factor C (RFC) complex, 249 
Replication fork, 238, 240f 
Replicative segregation, 653, 654f 


Replicative transposition, 455 
Replisomes, 242, 246 
Reporter genes, 556 
monitoring gene expression with, 556-559, 
558f, 559f 
Repressible operons, 484 
Repressor protein(s), 470 
in eukaryotic transcription regulation, 506-508, 
511, 511f 
lambda, and lysogeny, 496, 497f 
Reproduction 
asexual, 72-73 
sexual, 72-73 
gene transmission in, 4 
of single-celled organisms, 81, 82f 
Reproductive fitness, differential, 748-749 
within population, natural selection of, 748-753 
Reproductive isolation, 760-761 
mechanisms of, 762¢ 
speciation and, 761, 763-764 
Resistance (R) plasmid, 188 
Response to selection, 729 
Restriction endonucleases, 345, 348t 
Restriction enzymes, 345, 345-346, 348t 
in recombinant DNA technology, 568-572, 569b, 
571b, 572f 
Restriction fragment length polymorphism (RFLP), 
346, 350-351), 546 
Restriction maps, 570, 570, 570f, 571b 
Restriction-modification systems, 568, 569b 
Restriction sequence, 345-346, 348t 
Restrictive condition, 538 
Retinitis pigmentosa, 91t 
Retinoblastoma, RB1 gene mutation in, 72 
Retinoblastoma protein (pRB), cyclin D1-Cdk4 
complex and, 71 
mutations altering interaction of, 72 
Retrotransposons, 455, 455, 460—461, 460f 
Rett syndrome, 91t 
Reverse genetic analysis, 534 
Reverse genetics, 166, 534f 
genetic redundancy in flower development and 
(Case Study), 561—563b 
genomic approaches to, 641, 641f 
insertion mutants in, 552 
investigating gene action, 551-554, 554f, 
555f, 560f 
by TILLING, 554-556, 555f 
Reverse mutation rate (v), 753 
quantifying the effects of, 753 
Reverse mutations, 397, 398f 
Reverse transcriptase, 455 
Reverse transcription, 10 
Reversion mutation(s), 324 
Ames test for rate of, 408 
Reversions, 397, 398f 
Revertible mutants, 216 
RF (release factors), 318, 318-319 
RFLP (restriction fragment length polymorphism), 
346, 350-3515, 546 
Rheumatoid arthritis, 735 
Rho-dependent termination, 277, 277-278 
Rho protein, 277 
Rho utilization site (rut site), 277 
Ribonucleic acid. See RNA (ribonucleic acid) 
Ribonucleotides, RNA, 269, 269f 
Ribose, 269 
Ribosomal RNA (rRNA), 4-5, 9, 271 
Ribosomal RNA processing, 296-297, 296f 
Ribosomal subunits, 309 
Ribosome structures, 309-311, 309f, 310b 
three-dimensional, 311, 311f 
Ribosomes, 5 
in translation, 307—311 
Ribozymes, 271 
Rice, Golden (Oryza sativa), 591, 593-594, 593f 
Riggs, Arthur, pulse-chase labeling evidence 
of bidirectional DNA replication and, 
238-239, 239f 


RISC (RNA-induced silencing complex), 524 
Argonaute gene family and, 526 
RITS (RNA-induced transcriptional silencing) 
complex, 526 
RNA (ribonucleic acid), 5. See also Messenger RNA 
(mRNA); MicroRNA (miRNA); Ribosomal 
RNA (rRNA); tRNA (transfer RNA) 
antisense, 491, 494f 
classification of, 270-271 
double-stranded 
cleaving, 524-525 
gene silencing by, 524-525, 525f 
functional, 271 
gene expression control mediated by, 524—528, 
525f, 526f 
small interfering, 271 
small nuclear, 271 
transcription producing, 10-12 
RNA editing, 298 
post-transcriptional, 298—299, 298f, 299f 
RNA- induced silencing complex (RISC), 524 
Argonaute gene family and, 526 
RNA- induced transcriptional silencing (RITS) 
complex, 526 
RNA interference (RNAi), 524, 553 
chromatin modification by, 526-527, 526f 
evolution and applications of, 527-528 
in gene activity, 552-554, 554f 
RNA polymerase(s), 269 
bacterial, 272-273, 272f 
eukaryotic, 278 
multiple, transcription using, 278-285 
protein subunits of, 277—278, 278t 
RNA polymerase core, 272 
RNA polymerase I (RNA pol I), 278 
RNA polymerase I promoters, 283—284, 283f 
RNA polymerase I transcription, termination in, 
284-285 
RNA polymerase II (RNA pol II), 278 
RNA polymerase II transcription, consensus 
sequences for, 278-282 
RNA polymerase III (RNA pol III), 278 
RNA polymerase III promoters, 284, 284f 
RNA polymerase III transcription, termination in, 
284-285 
RNA polymerase transcription, 12 
RNA primer(s), 245 
RNA primer removal for DNA replication, 
247-248, 248f 
RNA processing, 267—304. 
post-transcriptional, 285-299 
RNA synthesis, 269-270, 270f 
RNA transcripts, carrying messages of genes, 
268-271 
RNAi (RNA interference), 524 
Roberts, Richard, “split genes” discovery by, 288 
Robertsonian translocation, 448, 449—450, 
449f, 451f 
Rodriguez, Raymond, bidirectional replication 
model and, 239 
Rolling circle replication, 194 
Rothstein, Rodney, double-stranded break model of 
meiotic recombination and, 418 
Roudier, Francois, on chromatin states, 517, 517f 
rRNA (see Ribosomal RNA (rRNA)) 
Rubin, Gerald, creating transgenic Drosophila and, 
594-595 


S 
S (synthesis) phase of interphase, 66, 66f, 71f 
Saccharomyces cerevisiae. See Yeast, baker’s 
(Saccharomyces cerevisiae) 
Sampling error, genetic drift causing allele frequency 
change by, 756-758, 756f 
Sanger, Fred 
and amino acid sequence of insulin 
determination, 585 
and dideoxynucleotide DNA sequencing, 
256-259, 258f, 259f, 260b 


Saturation mutagenesis, 535 
Sbe1 (starch branching enzyme 1) gene, 54 
SBE] (starch branching enzyme 1) gene, 
454-455 
Scaffold, 613, 614f 
Scanning, 313, 314f 
SCD. See Sickle cell disease (SCD) 
Schizosaccharomyces pombe, 526 
SCIDS (severe combined immunodeficiency 
syndrome), 601, 602 
Scientific method, steps in, 26 
Screening libraries, 581, 582f 
SDSA (synthesis-dependent strand annealing), 
417, 417f 
Second-division segregation, 175, 176f 
Second filial generation. See F, (second filial) 
generation 
Second-site reversion, 397, 398f 
Secondary endosymbiotic events, 674, 674—675 
Secondary structure of polypeptides, 308 
Seed development, wrinkled, 454—455b 
Segment polarity genes, 686f, 687 
Segregation 
allele, in quantitative trait production, 716, 
718-719, 719f 
chloroplast, and mating type in Chlamydomonas, 
659, 661 
chromosome, 448—449 
first-division, 175, 175f 
second-division, 175, 176f 
Segregation, law of (Mendel’s first law), 33-34 
meiosis and, 79-80, 79f 
in single-celled diploids, 81, 82f 
Segregation hypothesis, testing of 
F, self-fertilization in, 35, 35f, 36t 
test-cross analysis in, 34-35 
Selander, Robert, on genetic bottlenecks, 758 
Selected marker screen, 210 
Selection 
artificial, narrow sense heritability and, 729-730, 
729t, 730f 
response to, 729 
Selection coefficient, 749 
Selection differential, 729 
Selective growth medium, 195 
Self-splicing, intron, 294, 294f, 296 
Semiconservative DNA replication model, 
236, 237f 
Semiconservative replication, 8, 9f 
Semisterility, 435 
Sequence (IS) elements in bacterial genomes, 
transposition of, 456f 
Sequence gaps, 614-615 
Severe combined immunodeficiency syndrome 
(SCIDS), 601, 602 
Sex, recombination frequency and, 164, 165, 165f 
Sex chromosomes, 65 
multiple sets of, 90 
Sex determination, 87, 89 
diversity of, 88, 90, 90f 
in Drosophila, 87, 89 
mammalian, 88, 88f 
Sex development, human, mutations altering, 89b 
Sex-influenced traits, 117, 117 
as gene interactions, 117, 118f 
Sex-limited gene expression, 117 
Sex-limited traits, 117 
as gene interactions, 117 
Sex-linked inheritance, 84 
Sex-linked transmission, human, patterns of, 
90-92, 94-95 
Sexual reproduction, 72-73 
gene transmission in, 4 
Shared derived characteristics, 17 
Sharp, Phillip, “split genes” discovery by, 288 
Shine-Dalgarno sequence, 312/, 313, 313-315 
Short arm (p arm), 376-377, 377f 
Shotgun sequencing of DNA molecules, 
582f, 583 


Shuttle vector, 588 
Sickle cell disease (SCD), 338—364, 339 
electrophoretic analysis of, 349-353 
evolution by natural selection in human 
populations, 353, 357-358 
first patient with, 339-340 
genetic analysis of 
gel electrophoresis, 342-344, 343f, 344f 
hemoglobin peptide fingerprint analysis, 
344-345, 347b 
identification of DNA sequence variation, 
345-346, 346f, 348, 348¢ 
geographic distribution of, 357f 
hemoglobin structural change in, 342f 
inheritance of, 56b, 56f 
inherited hemoglobin variant causing, 339-341, 
339f, 340f 
malaria resistance in, 357—358 
in mice, gene therapy curing, 604b, 605f 
pleiotropy in, 121, 122f 
Sigma (o) subunit(s), 272, 272f 
alternative, 272-273 
E. coli RNA polymerase, 276t 
Signal hypothesis, 331 
Signal sequences (leader sequences), 331 
Signal-transduction pathways, 124 
Silencer sequences, 283, 506 
in eukaryotic transcription regulation, 506-508, 
508f, 510-511, 511f 
Silencing of genes 
by double-stranded DNA, 524—525, 525f 
by nucleotide methylation, 523-524 
Silent mutations, 394, 394f 
Simian virus 40 (SV40), 367¢ 
Simple transposons, 455, 457t 
SINE elements of humans, 460—461 
Single-gene trait, 123-124 
Single nucleotide polymorphism (SNP), 345, 346, 
346f, 734, 752-753, 766 
SNP variation in human genetic diversity, 
766, 768 
Single-stranded binding protein (SSB), 244 
SiRNAs (small interfering RNAs), 524 
Sister chromatid cohesion, 67, 70f 
Sister chromatids, 67 
creation of, in S phase, 66 
Site-specific recombination, 598, 598-599, 
599f, 600b 
Skin microbiome, 618b 
Sliding clamp, 249, 249f 
Small interfering RNAs (siRNAs), 271, 524 
Small nuclear RNA (snRNA), 271 
Small nucleoid-associated proteins, 369 
Small ribosomal subunit, 309 
SMC (structural maintenance of chromosomes) 
proteins, 369 
Smith, Hamilton, restriction endonucleases 
and, 569b 
Smithies, Oliver, knockout mouse development 
and, 596 
Smoking, 769 
SNP (single nucleotide polymorphism), 345, 346, 
346f, 734, 752-753, 766 
Solenoid structure, 373 
Somatic cells, 65 
division of, mitosis in, 65-73 
Somatic gene therapy, 601 
Sonic hedgehog (Shh) gene, 508, 509, 556, 632, 701, 
707-709b 
Southern blot analysis of B-globin gene variation, 
350-352, 352f, 356b 
Southern blotting, 349, 354—355b, 454b, 581 
Specialized transducing phages, 212, 212f 
Specialized transduction, 206, 212-213, 212f 
Speciation 
allopatric, 761, 761, 763, 763f 
processes of, 761, 762f 
reproductive isolation and, 761, 763-764 
sympatric, 763, 763-764 
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pinal muscular atrophy, 399t 
pindle fiber microtubules, 67 
pinocerebellar ataxia, 399t 
pliceosome, 289 
plicing mutations, 396-397, 396f, 397 
plicing signal sequences in eukaryotic pre-mRNA, 
288-289, 290f 
pontaneous mutation, 397 
pores in yeast reproduction, 81 
pradling, Allan, creating transgenic Drosophila 
and, 594-595 
Square root method, for determining autosomal 
allele frequencies, 747 
SR proteins, 294, 294f 
SRDSA2 gene mutation and pseudohermaphrodit- 
ism, 89b 
SRY gene 
evolution, 97 
in sex determination, 88, 88f, 90 
SSB (single-stranded binding protein), 244 
Stabilizing selection, 730 
Stahl, Franklin 
DNA replication experiment of, 236-237, 
237f, 238f 
double-stranded break model of meiotic 
recombination and, 418 
semiconservative replication mechanism and, 8 
Standard deviation (0), 49, 725 
Start codon, 12, 13f 
Start of transcription, 11, 12f 
Statistically significant difference, 49 
Stem cells. See Embryonic stem (ES) cells 
Stem-loop structure, 277 
Stem loops, 485, 487f 
Sterility, cytoplasmic male, 669b 
Stern, Curt 
mitotic crossover and, 176, 177f 
research on crossing over in Drosophila, 160 
Steroid receptor (SR), novel functions from 
ancestral, 765 
Steroid receptor (SR) evolution, vertebrate, 
764—766 
eroid receptor (SR) proteins, 765 
evens, Nettie, chromosome studies of, 84 
icky ends, 346, 569 
op codons, 12, 13f 
rains 
pure-breeding, 30 
to begin experimental crosses, 30, 30f 
true-breeding, 30 
rand invasion, 417 
rand polarity, 7 
rand slippage, 398, 399f 
reisinger, George, strand slippage and, 398 
ress response 
bacteria regulating transcription of, 
489-491, 490f 
in Vibrio cholerae, infection and (Case Study), 
497-498b 
Structural genes, 368 
Structural genomics, 612, 613-622. See also 
Annotation; Genome sequences 
clone-by-clone sequencing approach to, 
612t, 613 
human genome and, 616 
metagenomics in, 616—617 
variation in genome organization among species 
in, 620-621, 621f, 622f 
whole-genome shotgun sequencing approach to, 
613-616, 614f-616f 
Structural maintenance of chromosomes (SMC) 
proteins, 369 
Structural motifs of DNA-binding proteins, 505f 
Sturtevant, Alfred 
genetic linkage map of, 153-154, 153f 
recombination data for, 153f 
synthetic lethality identified by, 541, 541f 
Su(var) mutations, 512-513, 512f 
Subcloning, 570 
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Subfunctionalization, 628 
Submetacentric chromosomes, 377 
Sugar-phosphate backbone, 234, 235 
Sum (addition) rule of probability theory, 45 
Super-repressor protein mutations, 478f, 
479, 479t 
Supercoiled DNA, 246 
bacterial, 369, 370f 
Supercoiling, DNA, 369 
Supplemented minimal medium, 189b 
Suppressor mutation, 397 
Suppressor screen, 540 
Sutton, Walter 
on relation of meiosis and Mendelian hereditary 
principles, 79 
and study of chromosome movement, 2 
SWI/SNF complex in chromatin remodeling, 
515, 516 
SWRI complex, 515, 516-517 
Sxl (sex-lethal) gene, 299-300 
Sympatric speciation, 763, 763-764 
Synapsis, homologous chromosome, 75 
Synaptomorphies, 17 
Synaptonemal complex, 75-76, 78f 
Syncytial blastoderm, 685 
Syncytium, 685 
Synonymous codons, 321 
Syntenic genes, 145 
assortment of, 145 
Synteny, 632, 633f 
Synthesis-dependent strand annealing (SDSA), 
417, 417f 
Synthetic enhancement, 541, 541f 
Synthetic lethality, 541, 541, 541f 
in genetic interactions, 642-643, 643f 
Systems biology, 15, 644 
Szostak, Jack, 252 
double-stranded break model of meiotic 
recombination and, 418 


T 
T-DNA (transfer DNA), 589-590 
T strand, 193-194 
TAF (TBP-associated factor), 281 
Targeted induced local lesions in genomes 
(TILLING), 554 
reverse genetics by, 554-556, 555f 
TATA-binding protein (TBP), 281, 285 
TATA box, 280, 281, 282, 285 
Tatum, Edward 
and bacterial DNA transfer identification, 188, 
191, 191f 
one gene—one enzyme hypothesis of, 124-127, 
125-126b, 306 
Tautomeric shifts in DNA nucleotide bases, 
400-402, 401f 
TBP (TATA-binding protein), 281, 285 
TBP-associated factor (TAF), 281 
Telocentric chromosomes, 377 
Telomerase, 252 
Telomeres, 251, 251-254 
aging, cancer, and, 254 
Telophase, mitotic, 66, 68, 68f, 71f 
Telophase I, 77, 79 
Telophase II, 77f 
Temperate phages, 209 
Temperature-sensitive allele, 113 
Template strand, 10, 11f, 271 
Tenebrio molitor. See Beetle, yellow mealworm 
(Tenebrio molitor) 
Terminal deletion, 440, 441f 
Terminal inverted repeats, 453, 455 
Termination region, 272 
Termination sequence, 11, 12f 
Termination stem loop, 485, 487f 
TERT (telomerase reverse transcriptase) gene, 
253-254 
Tertiary endosymbiotic events, 674, 674-675 
Tertiary structure of polypeptides, 308 


Test-cross analysis 
in autosomal genetic linkage detection, 
150-151, 151f, 152b 
hypothesis testing by, 34—35, 34f, 35f 
in independent assortment testing, 39, 39f, 41 
in segregation hypothesis testing, 34—35 
three-point, 154 
in gene mapping, 154—160 
two-point, 150 
Test crosses, 31, 31f 
Tetrad(s), unordered, 172 
analysis of, 172, 173f, 174f, 174t 
Tetrad analysis, 172 
of genetic linkage in haploid eukaryotes, 
171-175 
Tetrahymena, telomerase activity in, 252-253 
Tetratype (TT), 172 
TE. See Transcription factors (TF) 
Thalassemia, 359 
enhancer mutations causing, 509 
transmission and molecular genetic analysis of 
(Case Study), 359-360, 359f 
Theta (8) value, 167, 167-168 
Third-base wobble, 298, 321 
effects of, 322, 322f 
genetic code displaying, 321-322, 322t 
pairings causing, 322, 322¢ 
Third filial generation (F; generation), 30, 30f 
Third-generation DNA sequencing technologies, 
259, 261 
Three-point mapping, finding the relative order of 
genes by, 154-156, 155f 
Three-point recombination map, constructing a, 
156-159 
Three-point test-cross analysis, 154 
in gene mapping, 154—160 
Three-strand double crossover, 161, 
161, 162f 
Threshold of genetic liability, 720 
Threshold traits, 719-721, 720, 720f, 721f 
Thymidine kinase (tk) gene, 596 
Thymine (T), 7 
Thymine dimer, 406, 407f 
Ti (tumor-inducing) plasmid, 589-590, 590f 
Tiling array, whole-genome, 638, 640f 
TILLING (targeted induced local lesions in 
genomes), 554 
reverse genetics by, 554-556, 555f 
Time-of-entry mapping, 197 
interrupted mating analysis producing, 
197-203, 198f, 202b 
Tobacco mosaic virus (TMV), 323, 367, 
367f, 367t 
Topoisomerase I, 369 
Topoisomerase II, 369 
Topoisomerases, 246 
Torpedo model of transcription termination, 
287, 288f 
Totipotency, 591 
Totipotent cells, 683 
Trait(s). See also Quantitative trait(s) 
additive, 715 
displaying incomplete penetrance, 
118-119, 119f 
heritability of, 726-730 
Mendel’s, molecular genetics of, 54-55, 55t 
multifactorial, 714 
polygenic, 714 
selection of single traits with dichotomous 
phenotypes, 30 
sex-influenced, 117, 117 
sex-limited, 117, 117 
single-gene, 123-124 
threshold, 719-721, 720, 720f, 721f 
X-linked dominant, 91¢ 
transmission of, 94 
X-linked recessive, 91t 
expression of, 90-92 
Trans-acting, 479 


Trans-acting regulatory proteins, 507 
Transcriptase, reverse, 455 
Transcription, 5, 10—12. See also Bacterial 
transcription 
archaeal, 285 
chloroplast, 668 
constitutive, 469 
eukaryotic 
chromatin remodeling regulating, 512-524 
enhancers and silencers in regulation of, 
506-508, 508f, 510-511, 510f 511f 
multiple RNA polymerases in, 278-285 
regulatory interactions in, 506-507, 506f 
of lambda phage gene, early, 493—494, 
495b, 496 
mitochondrial, 665-666, 665f, 666t 
molecular biology of, 267-304 
post-transcriptional processing modifies RNA 
molecules, 285-299 
regulated, 469 
start of, 11, 12f 
of stress response, bacteria regulating, 
489-491, 490f 
from tryptophan operon, 483-488 
Transcription factor B (TFB), 285 
Transcription factors (TF), 281 
Transcription-terminating factor I (TTRI), 285 
Transcription termination 
bacterial, 274f, 276 
RNA polymerase I and III, 284-285 
torpedo model of, 287, 288f 
Transcription termination mechanisms, bacterial, 
276-278, 277f 
Transcriptional fusion, 557f 
Transcriptome, 15, 636 
in high-throughput sequencing analysis, 637 
Transcriptomics, 15, 636, 636-638, 637f, 639f, 640f 
in functional genomics, 636—638 
Transductant, 209 
Transduction, 206 
definition of, 187 
gene transfer by bacterial, 206—213 
generalized, 209, 210f 
specialized, 212-213, 212f 
Transfer DNA (T-DNA), 589-590 
Transfer RNA. See tRNA (transfer RNA) 
Transformant, 206 
Transformation, 204-205 
bacterial 
gene transfer by, 204, 206 
mapping by, 206, 207f 
of plant genomes by Agrobacterium, 589-594, 
590f, 592f, 593f 
steps in, 206, 207f 
definition of, 187 
Transformation factor 
DNA as, 230, 230f 
identification of, 229-230, 229f 
Transgene(s), 542, 583 
in E. coli, 583—588, 584f 
as means of dissecting gene function, 554, 
556-561, 559f, 600b 
Transgenic animals, 594-598, 594f, 596f 
Transgenic fungi, generation of, 588—589, 
588f, 589f 
Transgenic Mus musculus, 596-598 
Transgenic organisms, 542, 567-568, 583 
creation of, 583-599 
Transgenic plants 
in agriculture, 591 
creating, 589-593, 592f 
Transgenic vertebrates, 595, 596f 
Transition mutation, 394 
Translation, 5, 12-13. See also Posttranslational 
polypeptide processing 
bacterial, antibiotics interfering with 
(Case Study), 332b 
of bacterial polycistronic mRNA, 320f 
chloroplast, 668 


elements of, 307-308, 307f 
mitochondrial, 665-666, 666t 
molecular biology of, 305-337 
of mRNA into polypeptide, genetic code in, 
320-322 
phases of, 311-319 
of polycistronic mRNA, 320 
polypeptide elongation in, 315-318 
speed and efficiency of, 319-320, 319f 
Translation elongation factor homologs, 
318, 318¢ 
Translation initiation, 311-315 
bacterial, 312-313 
eukaryotic, 313, 314f, 317-318b 
Translation initiation factor homologs, 315t 
Translation repressor proteins, 491 
Translation termination, 318-319, 319f, 325b 
Translational complex, 319, 320f 
Translational fusion, 557, 557f 
Translational regulation in bacteria, 491—492, 491t 
Translesion DNA polymerases, 416 
Translesion DNA synthesis, 407 
protein control of, 407—410 
Translocation, chromosome, 446, 448—450, 
449f-45 1f 
Translocation heterozygotes, 448 
Transmission genetics, 5, 6, 26, 26-63 
in produce aisle (Case Study), 44b 
Transposable genetic elements, 450, 450—456, 451f 
characteristics and classification of, 453—456 
Transposition, 431, 451 
bacterial genomes modified by, 456—459, 456f, 
456t, 457f, 457t, 458b 
discovery of, 451, 451f, 453 
eukaryotic genomes modified by, 457—461, 
459f, 460f 
Mendel’s peas shaped by, 453, 454—455b 
Transposons, 456, 457, 457f 
to clone genes, recombinant DNA technology in, 
543-544, 544f 
Transversion mutations, 394. 
Tree of life, 624, 624, 625f, 626b 
Trichothiodystrophy, 414f 
Trihybrid cross, 41 
Trihybrid-cross analysis in independent assortment, 
testing, 41—42, 41f 
Trinucleotide repeat disorders, 398, 399t 
Triple X syndrome, 434t 
Trisomic chromosomes, 432 
Trisomy, 432 
Trisomy 21, meiotic nondisjunction in, 434, 435t 
Trisomy rescue, 436-437, 437 
Trithorax (Trx), 519 
Trivalent synaptic structure, 435 
tRNA (transfer RNA), 9, 271 
in amino acid transport, 12 
charged, 311 
genetic code specificity and, 328, 330 
initiator, 311, 312f 
isoaccepting, 321, 321f 
uncharged, 311 
tRNA molecules, charging, 322 
tRNA processing, 297-298, 297f 
tRNA synthetases, 323, 323f 
Trombone model of DNA replication, 249, 250f 
Trp operon. See Tryptophan (trp) operon 
True-breeding strains, 30. See also Pure-breeding 
strains 
True reversion, 397, 398f 
Tryptophan (trp) operon 
attenuation mutations of, 488, 488f, 489b 
attenuation of, 485—488, 486f, 487f 
feedback inhibition of, 484—485, 485t 
transcription from, 483-488 
Tryptophan synthesis, feedback inhibition of, 
484—485, 484f, 485f 
TTFI (transcription-terminating factor I), 285 
Tumor suppressor gene, 71 
Turner syndrome, 434—436, 434+, 466 


Twin studies of heritability, 727-729, 729t 

Two-hybrid system, for protein interaction detection, 
638, 638, 640f, 641 

Two-point test-cross analysis, 150 

Two-strand double crossover, 161, 161, 162f 

Ty elements of yeast, 460 


U 

UAS (upstream activator sequence), 510 

Ubiquitanation, 416 

Ultrabithorax gene, 692-695, 692f, 693f 

Ultraviolet (UV)-induced photoproducts, 
406-408, 407f 

repair of, 412-413, 413f 

Ultraviolet (UV) irradiation, damage from, 
766, 768 

Ultraviolet (UV) repair, 413, 413 

Unbalanced translocation, 448, 449f 

Uncharged tRNAs, 311 

Unequal crossover, 441, 442f 

Unger, Franz, 27 

Uniparental disomy, 436, 436—437 

Uniparental inheritance, 650 

Unordered tetrad(s), 172 

analysis of, 172, 173f, 174f, 174t 

Unpaired loop, 442, 442f 

Unselected marker screen, 210 

Unstable mutant phenotype, 453 

Untranslated regions (UTRs). See 3’ untranslated 
region (3’ UTR); 5’ untranslated region 
(5' UTR) 

Upstream, 271 

Upstream activator sequence (UAS), 510 

Upstream control element, 283, 283f 

Uracil (U), 11, 269 

UTRs. See 3’ untranslated region (3' UTR); 
5’ untranslated region (5' UTR) 


V 
Variable expressivity of allele, 119, 120f 
Variance, 724-725 
additive, 726 
dominance, 726 
environmental, 725 
genetic, 725 
partitioning, 725 
interactive, 726 
phenotypic, 725 
partitioning, 725, 725f, 726b 
sources of, 725f 
Variation 
continuous, 714 
discontinuous, 714 
Vector(s) 
cloning, 577t 
bacteriophage, 576-577, 577t, 578f 
plasmids as, 574-575, 575f, 577t 
cosmid, 577 
expression, 583, 583-584, 584f 
eukaryotic, 584 
in gene therapy, viruses as, 601, 601¢, 602 
recombinant, 573 
Shuttle, 588 
Venter, J. Craig, on human genome sequence 
“draft,” 616 
Vertebrate steroid receptor evolution, 
764—766 
Vertebrates, transgenic, 595, 596f 
Vibrio cholerae, 220 
infection and stress response in (Case Study), 
497-498b 
Vibrio cholerae toxins, 498 
Viral genomes, 366 
composition and organization of selected, 367t 
Viral protein packaging, 366-368 
Viral structure and assembly, 367-368, 367f 
Virus(es) 
bacterial (see Bacteriophage(s)) 
as vectors in gene therapy, 601, 6014, 602 
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Viruses, 366 

enveloped, 367, 367f 

nonenveloped, 366-367, 367f 
Volkin, Elliot, mRNA discovery and, 269 
von Ettinghausen, Andreas, 26, 27 
von Tschermak, Erich 

on hereditary transmission, 2, 3f 

research of, paralleling Mendel’s, 42 
Vulval precursor cells (VPCs), 697-700 


W 
Waardenburg syndrome, variable expressivity of, 
119, 120f 
WAGR syndrome, 441, 441f 
WAGRO, 441 
Watson, James, 7f 
research on double-helical structure of DNA 
by, 4, 6 
Watts-Tobin, R. J., proof of triplet genetic code, 
323-324, 324t 
Weinberg, Wilhelm, genotype frequencies in 
populations and, 743 
Western blotting, 349, 355b, 454b 
in B-globin gene transcript and protein analysis, 
352-353, 353f 
Wexler, Susan, studies of Huntington disease by, 549 
WGS sequencing. See Whole-genome shotgun 
(WGS) sequencing 
Whole-genome shotgun (WGS) sequencing, 613 
of bacterial genome, 614—615, 615f 
of eukaryotic genome, 615-616, 616f 
future of, 616 
Whole-genome shotgun (WGS) sequencing 
approach to structural genomics, 613-616, 
614f, 615f 
Whole-genome tiling array, 638, 640f 
Wieschaus, Eric 
mutagenesis strategy used by, 538 
on pattern formation in Drosophila, 684, 686 
Wild-type Huntington disease (HD) genes, 262 
Wilkens, Horst, heritability analysis by, 727 
Wilkins, Maurice, research on double-helical 
structure of DNA by, 4 
Williams-Beuren syndrome (WBS), 441, 442f 
Wilms tumor, 441 
Wilson, Edmund, on nuclein (DNA) in 
inheritance, 228 
Woese, Carl, 4 
Wollman, Ellie, on interrupted mating in time- 
of-entry mapping, 197 
Wormwood (Artemesia annua), 587b 
Wright, Sewall, evolutionary genetics research 
and, 17 


X 
X/A (X/autosome) ratio, 88 
X chromosomes, random inactivation in placental 
mammals, 95, 95-96, 96f 
X-inactivation, 521-522 
X-linked inheritance, 84—85, 84f, 85 
dominant, 90 
recessive, 90 
features characterizing, 91—92, 91f 
X-linked trait(s) 
dominant, 91¢ 
transmission of, 94 
recessive, 91t 
expression of, 90-92 
X-ray diffraction imagery in study of DNA 
structure, 6, 6f 
@X174, 367t 
Xeroderma pigmentosum (XP), 414b 
complement groups identification in 
(Case Study), 137b 
XIST RNA in random X inactivation, 96 


Y 


Y chromosome, (degenerative) evolution of 
mammalian, 96-98, 97f 
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Y-linked inheritance, 94—95 
Yanofsky, Charles, cotransduction mapping and, 
210-211, 211f 211¢, 212f 
Yeast, baker’s (Saccharomyces cerevisiae) 
biparental inheritance in, 661-662, 661f 
in genetic screen design, 536 
integrating DNA into the genome of, 
588-589, 589f 
life cycle of, 172f 
mutants of, in categorizing genes, 641-642, 642f 


reproduction of, 81, 82f 
transcription regulation in, 510-511, 510f, 
520, 520f 
Yeast, Ty elements of, 460 
Yeast artificial chromosomes (YACs), 577 
Yeast plasmids, generation of, 588, 588f 
Yellow mealworm beetle. See Beetle, yellow 
mealworm (Tenebrio molitor) 
Yule, George Udny, on genotype frequencies in 
populations, 743 


Z 

Z-form DNA, 236 

Z/W system, 88, 90 

Zebrafish (Danio rerio), 483 

Zmax 168 

Zone of polarizing activity (ZPA), 
701, 702f 

Zygotene stage of prophase I, 75, 75f 

Zygotic genes, 687 


Model Organisms 


Life Cycle 


A. thaliana 
(mouse ear cress) 


S. cerevisiae 
(baker's yeast) 


C. elegans 
(nematode) 


m, meiosis; 
f, fertilization 


Generation time 


Genome 


Sexual 
mating 


20-40 minutes 


10 weeks 


A. thaliana 


(mouse ear cress) 


R Meiosis 
ONt 


Mitosis 


2-3 hours 


S. cerevisiae 
(baker's yeast) 


Hermaphrodite (XX) 
$2n 
m 
m 
f Egg Male 


3 days 


C. elegans 
(nematode) 


Size 


Chromosomes 


Number of genes 


Genetic distance 


Nomenclature 


4.64 Mb 


1 circular chromosome 
+ plasmids 


4200 


100 minutes 


130 Mb 


5 chromosomes 
(2n= 10) 


28,775 
600 cM 


A. thaliana 
(mouse ear cress) 


12 Mb 


16 chromosomes (2n = 32) 


6607 
4500 cM 


S. cerevisiae 
(baker's yeast) 


100 Mb 


5 autosomes + 
X chromosome 
(2n= 10+ XX or XO) 


20,532 
300 cM 


C. elegans 


(nematode) 


Wild-type allele 


Mutant allele 


Allele 
designation 
style 


Specific mutant 
allele 


Dominant mutant 
allele 


Protein product 


Gene name 
style 


Website 


lacZ+ 


lacZ 


Superscript 


lacZ? 


LacZ 


Genes are usually named 
based on their presumed 
wild-type function. They 


are three letters, sometimes 


followed by a fourth letter 
for genes with similar 
functions. 


PHB 


phb 


Hyphenated number 


phb-6 


phb-1d 


PHB 


Genes are usually named 
based on the mutant 
phenotype. They are 
three letters, sometimes 
followed by a number for 
genes with similar 
mutant phenotypes. 


A. thaliana 
(mouse ear cress) 


CDC28 


cdc28 


Hyphenated number 


cdc28-3 


CDC28p 


Genes are usually named 
based on their presumed 


wild-type function. They are 


three letters, sometimes 
followed by numbers for 
genes with similar mutant 
phenotypes. 


S. cerevisiae 
(baker's yeast) 


dpy-10 


dpy-10(allele#) 


Parenthetical 


dpy-10(e128) 


DPY-10 


Genes are usually named 
based on the mutant 
phenotype. They are 
three letters, sometimes 
followed by a hyphen 
and number for genes 
with similar mutant 
phenotypes. 


C. elegans 


(nematode) 


www.ecolicommunity.org 


www.arabidopsis.org 


www.yeastgenome.org 


www.wormbase.org 


Life Cycle 


m, meiosis; 
f, fertilization 


Generation time 


Genome 
Size 


Chromosomes 


Number of genes 


Genetic distance 


Nomenclature 


Wild-type allele 


Mutant allele 


Allele 
designation 
style 


Specific mutant 
allele 


Dominant mutant 
allele 
Protein product 


Gene name style 


Website 


Drosophila 
(fruit fly) 


D. rerio 
(zebrafish) 


Mus musculus 
(house mouse) 


2 weeks 


Drosophila 
(fruit fly) 


180 Mb 

3 autosomes + X and Y 
(2n=8) 

13,937 


275 cM (female)/ 
0 cM (male) 


Drosophila 


(fruit fly) 


w? (recessive) 
Ant* (dominant) 


w (recessive) 
Ant (dominant) 


Superscript 


w' (recessive) 
Ant? (dominant) 


first letter capital 
W 
ANT 


Genes are usually named 
based on the mutant 
phenotype. 


Drosophila 


(fruit fly) 


http://flybase.org 


3 months 


D. rerio 
(zebrafish) 


2000 Mb 
25 chromosomes 


(2n=50) 


14,700 
3000 cM (female) 


D. rerio 
(zebrafish) 


chi* 


ch jallelet 


Superscript 


chic!?3 


chit?! 


Chi 


Genes are often named 
based on the mutant 
phenotype. They are 
three letters, sometimes 
followed by a number for 
genes with similar 
mutant phenotypes. 


D. rerio 
(zebrafish) 


http://zfin.org 


10 weeks 


Mus musculus 
(house mouse) 


3000 Mb 

19 autosomes + X and Y 
(2n = 40) 

23,139 


1400 cM (sex averaged) 


Mus musculus 
(house mouse) 


4Tcp 1 


Tcp1 


Hyphenated number 
or superscript number 


Tcp1-3, Tep P 


Tcp1 


Genes are usually named 
based on their mutant 
phenotypes or the proteins 
they encode. They are three 
letters, sometimes followed 
by a number for genes with 
similar functions. 


Mus musculus 
(house mouse) 


www.informatics.jax.org 


2 2n SF 2n 
XX XY 
m m 
Egg Sperm 
n n 
f 
20 years 
3000 Mb 


22 autosomes + X and Y 
(2n = 46) 


20,769 


4460 cM (female)/ 
2590 cM (male) 


HTT 


HTT*allele# 


Numbers following 
an asterisk 


nl 


HTT protein 


Genes are often named after 
the disorder or abnormality 
resulting from mutations 

in the gene. Names are 
three letters. 


http://genome.ucsc.edu 


